UNIT-03 Numpy
UNIT-03 Numpy
• NumPy is a basic package for scientific computing with Python and especially for data analysis.
• Functions for performing element-wise computations with arrays or mathematical operations between
arrays
• A mature C API to enable Python extensions and native C or C++ code to access NumPy’s data structures
and computational facilities.
• one of its primary uses in data analysis is as a container for data to be passed between algorithms and
libraries.
INTRODUCTION
• One of the reasons NumPy is so important for numerical computations in Python is because it is designed for
efficiency on large arrays of data. There are a number of reasons for this:
1. NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects.
NumPy’s library of algorithms written in the C language can operate on this memory without any type
checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.
2. NumPy operations perform complex computations on entire arrays without the need for Python for loops.
The NumPy ndarray:
A multidimensional array object
• One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in
Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent
operations between scalar elements.
• To give you a flavor of how NumPy enables batch computations with similar syntax to scalar values on built-in Python objects, I first
import NumPy and generate a small array of random data:
• In [14]: data
• Out[14]:
• An ndarray is a generic multidimensional container for homogeneous data; that is, all of the
elements must be the same type.
• Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an
object describing the data type of the array:
• >>> a
• array([1, 2, 3])
• >>> type(a)
• <type 'numpy.ndarray'>
The NumPy ndarray:
A multidimensional array object
• In order to know the associated dtype to the just created ndarray, you have to use the dtype attribute.
• The data type is stored in a special dtype metadata object
• >>> a.dtype
• dtype('int32')
• >>> a.ndim
• 1
• >>> a.size
• 3
Creating ndarrays
• To define a new ndarray, the easiest way is to use the array() function, passing a Python list containing the elements to be included in it
as an argument.
• Example:
• In [19]: data1 = [6, 7.5, 8, 0, 1]
• In [20]: arr1 = np.array(data1)
• In [21]: arr1
• Out[21]: array([ 6. , 7.5, 8. , 0. , 1. ])
• But the use of arrays can be easily extended to the case with several dimensions. For example, if you define a two-dimensional array
2x2:
• >>> b = np.array([[1.3, 2.4],[0.3, 4.1]])
• >>> b.dtype
• dtype('float64')
• >>> b.ndim
• 2
• >>> b.size
• 4
• >>> b.shape
• (2L, 2L)
• This array has rank 2, since it has two axis, each of length 2.
Creating ndarrays
• In addition to np.array, there are a number of other functions for creating new arrays.
• As examples, zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape.
• empty creates an array without initializing its values to any particular value.
• To create a higher dimensional array with these methods, pass a tuple for the shape:
• In [25]: np.empty((2, 3, 2))
• Out[25]:
• array([[[ 4.94065646e-324, 4.94065646e-324],
• [ 3.87491056e-297, 2.46845796e-130],
• [ 4.94065646e-324, 4.94065646e-324]],
• [[ 1.90723115e+083, 5.73293533e-053],
• [ -2.33568637e+124, -6.70608105e-012],
• [ 4.42786966e+160, 1.27100354e+025]]])
Data Types for ndarrays
• The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to
interpret a chunk of memory as a particular type of data:
• In [35]: arr1.dtype
• Out[35]: dtype('float64')
• Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.
• dtypes are a source of NumPy’s flexibility for interacting with data coming from other systems.
• The numerical dtypes are named the same way: a type name, like float or int, followed by a number indicating the number
of bits per element.
• A standard doubleprecision floating-point value (what’s used under the hood in Python’s float object) takes up 8 bytes or
64 bits. Thus, this type is known in NumPy as float64.
Data Types for ndarrays
• The type of the array can also be explicitly specified at creation time:
• >>>c = np.array([[1, 2], [3, 4]], dtype=complex)
• >>>c
• >>>array([[1.+0.j, 2.+0.j],
• [3.+0.j, 4.+0.j]])
Data Types for ndarrays
• You can explicitly convert or cast an array from one dtype to another using ndarray’s astype
method:
• In [37]: arr = np.array([1, 2, 3, 4, 5])
• In [38]: arr.dtype
• Out[38]: dtype('int64')
• In [39]: float_arr = arr.astype(np.float64)
• In [40]: float_arr.dtype
• Out[40]: dtype('float64')
Data Types for ndarrays
• In this example, integers were cast to floating point. If I cast some floating-point numbers
to be of integer dtype, the decimal part will be truncated:
• In [42]: arr
• In [43]: arr.astype(np.int32)
• If you have an array of strings representing numbers, you can use astype to
convert them to numeric form:
• In [45]: numeric_strings.astype(float)
• When you print an array, NumPy displays it in a similar way to nested lists, but with the
following layout:
• the last axis is printed from left to right,
• the second-to-last is printed from top to bottom,
• the rest are also printed from top to bottom, with each slice separated from the next by
an empty line.
• One-dimensional arrays are then printed as rows, bidimensionals as matrices and
tridimensionals as lists of matrices.
• >>>a = np.arange(6) # 1d array
• >>>print(a)
• [0 1 2 3 4 5]
• >>>b = np.arange(12).reshape(4, 3) # 2d array
• >>>print(b)
• [[ 0 1 2]
• [ 3 4 5]
• [ 6 7 8]
• [ 9 10 11]]
• >>>c = np.arange(24).reshape(2, 3, 4) # 3d array
• >>>print(c)
• [[[ 0 1 2 3]
• [ 4 5 6 7]
• [ 8 9 10 11]]
• [[12 13 14 15]
• [16 17 18 19]
• [20 21 22 23]]]
Arithmetic with NumPy Arrays
• Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users
call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:
• >>>a = np.array([20, 30, 40, 50])
• >>>b = np.arange(4)
• >>>b
• >>>array([0, 1, 2, 3])
• >>>c = a - b
• >>>c
• >>>array([20, 29, 38, 47])
• >>>b**2
• >>>array([0, 1, 4, 9])
• >>>10 * np.sin(a)
• >>>array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
• >>>a < 35
• array([ True, True, False, False])
Arithmetic with NumPy Arrays
• In [58]: arr2
• Out[58]:
• array([[ 0., 4., 1.],
• [ 7., 2., 12.]])
• Out[59]:
• array([[False, True, False],
• [ True, False, True]], dtype=bool)
• Operations between differently sized arrays is called broadcasting.
Basic Indexing and Slicing
• NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data
or individual elements. One-dimensional arrays are simple:
• In [60]: arr = np.arange(10)
• In [61]: arr
• Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
• In [62]: arr[5]
• Out[62]: 5
• In [63]: arr[5:8]
• Out[63]: array([5, 6, 7])
• In [64]: arr[5:8] = 12
• In [65]: arr
• Out[65]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
Fancy Indexing
• Fancy indexing is a term adopted by NumPy to • To select out a subset of the rows in a particular order,
describe indexing using integer arrays. you can simply pass a list or
• Suppose we had an 8 × 4 array:
• ndarray of integers specifying the desired order:
• In [117]: arr = np.empty((8, 4))
• In [120]: arr[[4, 3, 0, 6]]
• In [118]: for i in range(8):
• Out[120]:
• .....: arr[i] = i
• array([[ 4., 4., 4., 4.],
• In [119]: arr
• [ 3., 3., 3., 3.],
• Out[119]:
• [ 0., 0., 0., 0.],
• array([[ 0., 0., 0., 0.],
• [ 6., 6., 6., 6.]])
• [ 1., 1., 1., 1.],
• Using negative indices selects rows from the end:
• [ 2., 2., 2., 2.],
• In [121]: arr[[-3, -5, -7]]
• [ 3., 3., 3., 3.],
• Out[121]:
• [ 4., 4., 4., 4.],
• array([[ 5., 5., 5., 5.],
• [ 5., 5., 5., 5.],
• [ 3., 3., 3., 3.],
• [ 6., 6., 6., 6.],
• [ 7., 7., 7., 7.]])
• [ 1., 1., 1., 1.]])
Fancy Indexing
• Passing multiple index arrays does something slightly different; it selects a one dimensional array of elements
corresponding to each tuple of indices:
• In [122]: arr = np.arange(32).reshape((8, 4))
• In [123]: arr
• Out[123]:
• array([[ 0, 1, 2, 3],
• [ 4, 5, 6, 7],
• [ 8, 9, 10, 11],
• [12, 13, 14, 15],
• [16, 17, 18, 19],
• [20, 21, 22, 23], Here the red color
elements represents the
• [24, 25, 26, 27], position ,0,3,1,2 location
• [28, 29, 30, 31]]) elements will be fetched
from 1,5,7,2 rows
• In [124]: arr[[1, 5, 7, 2], [0, 3, 1, 2]]
• Out[124]: array([ 4, 23, 29, 10])
• Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected. Regardless of how many dimensions the array
Fancy Indexing
• The behavior of fancy indexing in this case is a bit different from what some users might have expected
(myself included), which is the rectangular region formed by selecting a subset of the matrix’s rows and
columns. Here is one way to get that:
• [ 8, 11, 9, 10]])
• Keep in mind that fancy indexing, unlike slicing, always copies the data into a new array.
Transposing Arrays and Swapping Axes
• Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying
anything.
• Arrays have the transpose method and also the special T attribute:
• In [126]: arr = np.arange(15).reshape((3, 5))
• In [127]: arr
• Out[127]:
• array([[ 0, 1, 2, 3, 4],
• [ 5, 6, 7, 8, 9],
• [10, 11, 12, 13, 14]])
• In [128]: arr.T
• Out[128]:
• array([[ 0, 5, 10],
• [ 1, 6, 11],
• [ 2, 7, 12],
• [ 3, 8, 13],
• [ 4, 9, 14]])
Transposing Arrays and Swapping Axes
• Simple transposing with .T is a special case of swapping axes.
• ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data:
• In [135]: arr
• Out[135]:
• array([[[ 0, 1, 2, 3],
• [ 4, 5, 6, 7]],
• [[ 8, 9, 10, 11],
• [12, 13, 14, 15]]])
• In [136]: arr.swapaxes(1, 2)
• Out[136]:
• array([[[ 0, 4],
• [ 1, 5],
• [ 2, 6],
• [ 3, 7]],
• [[ 8, 12], swapaxes similarly returns a view on the data without making a copy..
• [ 9, 13],
• [10, 14],
• [11, 15]]])
Universal Functions: Fast Element-Wise Array
Functions
• A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays.
• You can think of them as fast vectorized wrappers for simple functions that take one or more scalar
values and produce one or more scalar results.
• Many ufuncs are simple element-wise transformations, like sqrt or exp:
• In [137]: arr = np.arange(10)
• In [138]: arr
• Out[138]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
• In [139]: np.sqrt(arr)
• Out[139]:
• array([ 0. , 1. , 1.4142, 1.7321, 2. , 2.2361, 2.4495,
• 2.6458, 2.8284, 3. ])
• In [140]: np.exp(arr) These are referred to as unary ufuncs.
• Out[140]:
• array([ 1. , 2.7183, 7.3891, 20.0855, 54.5982,
• 148.4132, 403.4288, 1096.6332, 2980.958 , 8103.0839])
Universal Functions: Fast Element-Wise Array
Functions
• Others, such as add or maximum, take two arrays (thus, binary ufuncs) and return a single array as the result:
• In [141]: x = np.random.randn(8)
• In [142]: y = np.random.randn(8)
• In [143]: x
• Out[143]:
• array([-0.0119, 1.0048, 1.3272, -0.9193, -1.5491, 0.0222, 0.7584,
• -0.6605])
• In [144]: y
• Out[144]:
• array([ 0.8626, -0.01 , 0.05 , 0.6702, 0.853 , -0.9559, -0.0235,
• -2.3042])
• In [145]: np.maximum(x, y)
• Out[145]:
• array([ 0.8626, 1.0048, 1.3272, 0.6702, 0.853 , 0.0222, 0.7584,
• -0.6605])
Array-Oriented Programming with
Arrays
• Using NumPy arrays enables you to express many kinds of data
processing tasks as concise array expressions that might otherwise
require writing loops.
• This practice of replacing explicit loops with array expressions is
commonly referred to as vectorization.
• In general, vectorized array operations will often be one or two (or
more) orders of magnitude faster than their pure Python equivalents,
with the biggest impact in any kind of numerical computations.
• As a simple example, suppose we wished to evaluate the function
sqrt(x^2 + y^2) across a regular grid of values.
• The np.meshgrid function takes two 1D arrays and produces two 2D
matrices corresponding to all pairs of (x, y) in the two arrays:
In [155]: points = np.arange(-5, 5, 0.01) # 1000
equally spaced points
In [156]: xs, ys = np.meshgrid(points, points)
In [157]: ys
Out[157]:
array([[-5. , -5. , -5. , ..., -5. , -5. , -5. ],
[-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
[-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
...,
[ 4.97, 4.97, 4.97, ..., 4.97, 4.97, 4.97],
[ 4.98, 4.98, 4.98, ..., 4.98, 4.98, 4.98],
[ 4.99, 4.99, 4.99, ..., 4.99, 4.99, 4.99]])
• Now, evaluating the function is a matter of writing the same
expression you would write with two points:
In [158]: z = np.sqrt(xs ** 2 + ys ** 2)
In [159]: z
Out[159]:
array([[ 7.0711, 7.064 , 7.0569, ..., 7.0499, 7.0569,
7.064 ],
[ 7.064 , 7.0569, 7.0499, ..., 7.0428, 7.0499,
7.0569],
[ 7.0569, 7.0499, 7.0428, ..., 7.0357, 7.0428,
7.0499],
...,
[ 7.0499, 7.0428, 7.0357, ..., 7.0286, 7.0357,
7.0428],
[ 7.0569, 7.0499, 7.0428, ..., 7.0357, 7.0428,
7.0499],
[ 7.064 , 7.0569, 7.0499, ..., 7.0428, 7.0499,
7.0569]])
• In [160]: import matplotlib.pyplot as
plt
• In [161]: plt.imshow(z, cmap=plt.cm.gray);
plt.colorbar()
• Out[161]: <matplotlib.colorbar.Colorbar at
0x7f715e3fa630>
• In [162]: plt.title("Image plot of $\sqrt{x^2
+ y^2}$ for a grid of values")
• Out[162]: <matplotlib.text.Text at
0x7f715d2de748>