Numpy
Numpy
NumPy, short for Numerical Python, has long been a cornerstone of numerical computing in
Python. It provides the data structures, algorithms, and library glue needed for most scientific
applications involving numerical data in Python. NumPy contains, among other things:
Beyond the fast array-processing capabilities that NumPy adds to Python, one of its primary
uses in data analysis is as a container for data to be passed between algorithms and libraries.
For numerical data, NumPy arrays are more efficient for storing and manipulating data than the
other built-in Python data structures. Also, libraries written in a lower-level language, such as C
or Fortran, can operate on the data stored in a NumPy array without copying data into some
other memory representation. Thus, many numerical computing tools for Python either assume
NumPy arrays as a primary data structure or else target seamless interoperability with NumPy.
Why Numpy?
One of the reasons NumPy is so important for numerical computations in Python is because it is
designed for efficiency on large arrays of data. There are a number of reasons for this:
NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and use significantly less memory.
To give you a flavor of how NumPy enables batch computations with similar syntax to scalar
values on built-in Python objects, we first import NumPy and generate a small array of random
data:
An ndarray is a generic multidimensional container for homogeneous data; that is, all of the
elements must be the same type. Every array has a shape, a tuple indicating the size of each
dimension, and a dtype, an object describing the data type of the array:
Creating ndarrays The easiest way to create an array is to use the array function. This accepts
any sequence-like object (including other arrays) and produces a new NumPy array containing
the passed data. For example, a list is a good candidate for conversion:
Nested sequences, like a list of equal-length lists, will be converted into a multidimen‐ sional
array:
Unless explicitly specified, np.array tries to infer a good data type for the array that it creates.
The data type is stored in a special dtype metadata object; for example, in the previous two
examples we have:
Matrix functions
1. The identity matrix:
In linear algebra, the identity matrix of size n is the n × n square matrix with ones on the
main diagonal and zeros elsewhere.
4. Function ‘fromfunction’:
Construct an array by executing a function over each coordinate.
Let’s create a vector-based on its indices.
array Convert input data (list, tuple, array, or other sequence type) to an
ndarray either by inferring a dtype or explicitly specifying a dtype; copies
the input data by default
asarray Convert input to ndarray, but do not copy if the input is already an
ndarray arange Like the built-in range but returns an ndarray instead of
a list
ones, ones_like Produce an array of all 1s with the given shape and dtype; ones_like
takes another array and produces a ones array of the same shape and
dtype
zeros, zeros_like Like ones and ones_like but producing arrays of 0s instead
empty, empty_like Create new arrays by allocating new memory, but do not populate with
any values like ones and zeros
full, full_like Produce an array of the given shape and dtype with all values set to the
indicated “fill value” full_like takes another array and produces a filled
array of the same shape and dtype
eye, identity Create a square N × N identity matrix (1s on the diagonal and 0s
elsewhere)
Aggregation methods:
● ndarray.sum: Return the sum of the array elements over the given axis.
● ndarray.sum: Return the product of the array elements over the given axis.
● ndarray.max: Return the maximum along the given axis.
● ndarray.min: Return the minimum along the given axis.
● ndarray.mean: Returns the average of the array elements along the given axis.
● ndarray.cumsum: Return the cumulative sum of the elements along the given axis.
● ndarray.cumprod: Return the cumulative product of the elements along the given axis.
● ndarray.var: Returns the variance of the array elements, along the given axis.
● ndarray.std: Returns the standard deviation of the array elements along the given axis.
Learnvista Pvt Ltd.
2nd Floor, 147, 5th Main Rd, Rajiv Gandhi Nagar HSR Sector 7,Near Salarpuria Serenity, Bengaluru, Karnataka 560102
Mob:- +91 779568798, Email:- [email protected]
● ndarray.argmin: Return indices of the minimum values along the given axis.
● ndarray.argmax: Return indices of the maximum values along the given axis.
dtypes are a source of NumPy’s flexibility for interacting with data coming from other systems. In
most cases they provide a mapping directly onto an underlying disk or memory representation,
which makes it easy to read and write binary streams of data to disk and also to connect to code
written in a low-level language like C or Fortran. The numerical dtypes are named the same
way: a type name, like float or int, followed by a number indicating the number of bits per
element. A standard double precision floating-point value (what’s used under the hood in
Python’s float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64.
Note: It’s important to be cautious when using the numpy.string_ type, as string data in NumPy
is fixed size and may truncate input without warning. pandas has more intuitive out-of-the-box
behavior on non-numeric data.
As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is
propagated (or broadcasted henceforth) to the entire selection. An important first distinction
from Python’s built-in lists is that array slices are views on the original array. This means that the
data is not copied, and any modifications to the view will be reflected in the source array. To give
an example of this, first create a slice of arr:
Now, when we change values in arr_slice, the mutations are reflected in the original array arr:
With higher dimensional arrays, you have many more options. In a two-dimensional array, the
elements at each index are no longer scalars but rather one-dimensional arrays:
Thus, individual elements can be accessed recursively. But that is a bit too much work, so you
can pass a comma-separated list of indices to select individual elements. So these are
equivalent:
Consider the two-dimensional array from before, arr2d. Slicing this array is a bit different:
You can pass multiple slices just like you can pass multiple indexes:
When slicing like this, you always obtain array views of the same number of dimensions. By
mixing integer indexes and slices, you get lower dimensional slices. For example, we can select
the second row but only the first two columns like so:
Similarly, I can select the third column but only the first two rows like so:
Boolean Indexing
Let’s consider an example where we have some data in an array and an array of names
with duplicates. I’m going to use here the randn function in numpy.random to generate
some random normally distributed data:
The boolean array must be of the same length as the array axis it’s indexing. You can even mix
and match boolean arrays with slices or integers.
● Function: ‘logspace’:
Return numbers spaced evenly on a log scale (by default in base 10).