Python - Introduction To Numpy For Multi-Dimensional Data: Course Overview
Python - Introduction To Numpy For Multi-Dimensional Data: Course Overview
Course Overview
[Video description begins] Topic title: Introduction to NumPy for Multi-dimensional
Data. [Video description ends]
A little about myself first. I have a master's degree in computer science from
Columbia University and have previously worked in companies such as Deutsche
Bank and WebMD in New York. I presently work for Loonycorn; a startup for high-
quality video content.
NumPy is one of the fundamental packages for scientific computing. Its primary
characteristic is that it allows data to be represented in the form of n-dimensional
arrays and then supplies a number of features, which allow us to transform and
manipulate those arrays.
In this course, we begin from the very basics of NumPy. So no prior knowledge of
this package is required. All you need to know is some basic Python and high school
level math. We will start with creating and modifying NumPy arrays and work our
way gradually to more complex operations, such as indexing and slicing, universal
functions, and reshaping arrays.
Once you are done with this course, you'll be comfortable with using
multidimensional arrays, and will understand how to perform both element-wise
mathematical operations, as well as aggregate operations on data using NumPy.
For this course on NumPy, it is good to be familiar with certain software tools, as well
as having certain skills in order to understand array operations. For instance, you
should be comfortable programming in Python 3, as this is the language which is used
throughout this course. You should also have some familiarity with working with
Jupyter notebooks. And this is the dev environment which will be used for all the
demos. In addition, some demos in this course will involve basic matrix operations.
So some amount of familiarity with those will also be useful. [Video description
begins] So, you should have basic knowledge of matrix operations and working with
arrays. [Video description ends]
Since NumPy provides a framework to express data in the form of arrays and supplies
a lot of array-based operations, it becomes the fundamental building block for a
number of other Python libraries. All these libraries, which make use of NumPy under
the hood, are all referred to collectively as the NumPy ecosystem.
Some of these libraries include statsmodel, which is used extensively in order to
perform statistical operations. [Video description begins] Statsmodels is used to
estimate statistical models and perform tests. [Video description ends]
Scikit-image is another library which makes use of NumPy, and this is used in the
world of image processing. [Video description begins] Scikit-image is a collection of
algorithms for image processing. [Video description ends]
Scikit-learn is one more Python library which makes heavy use of NumPy. And this is
used for machine learning, as well as for data mining, and data analysis. [Video
description begins] Scikit-learn is a simple and efficient tool for machine learning in
Python. [Video description ends]
Pandas is another tool which is built on top of NumPy. And it supplies a number of
easy to use data structures, which are very helpful in the world of data analysis.
[Video description begins] Pandas is used for data analysis and manipulation. [Video
description ends]
And finally, Matplotlib, which is a data visualization tool, also makes significant use
of NumPy. [Video description begins] Matplotlib is a plotting library for 2D graphs
and visualizations. [Video description ends]
All the demos in this course will be performed in Jupyter Notebooks. Jupyter
Notebooks are a browser-based interactive development environment, which allows
us to execute and view the results of section of our application code without having to
run the entire application.
If not, you may simply copy this URL, which comes up on typing the command.
[Video description begins] He points to the URL in the output. [Video description
ends] And then bring up your browser, [Video description begins] He switches to a
web browser. [Video description ends] and then paste this URL in the address bar, and
hit Enter.
We can just click on it to view its contents. [Video description begins] He opens the
datasets folder and three files are listed: countries_of_the_world.csv,
float_values.csv, and koala.jpg. A note reads: This data will be used later in this
course. [Video description ends] And you notice that it contains two CSV files and a
JPEG file. At this point, we just navigate back to the parent directory of our
workspace. [Video description begins] He clicks the folder icon in the breadcrumbs of
the current directory and returns to the dashboard. [Video description ends] And
we're now ready to code our first demo for NumPy, for which we will bring up a new
Jupyter Notebook.
So we may go ahead and click New. [Video description begins] He clicks the New
drop-down button. The drop-down menu contains the sections: Notebook and
Other. [Video description ends] And then select the back-end kernel which will be
used for this notebook. Throughout this course I will be using Python 3 as the back-
end. So I'm just going to click on that, and this brings up a new Jupyter Notebook.
[Video description begins] He selects Python 3 in the Notebook section and a new,
untitled Jupyter notebook opens. The menu includes options such as File, Edit, View,
and Insert, and a toolbar is also available. The notebook area contains one
cell. [Video description ends]
And the first thing I'm going to do here is to rename this notebook, we just come up
with a default name, Untitled, and call it something a little more meaningful. So our
first demo is going to be about ArrayCreation, and this is what I'll call this notebook.
[Video description begins] He clicks the notebook's name: Untitled. A Rename
Notebook dialog box opens. He types the name, "ArrayCreation", in the Enter a new
notebook name text box. [Video description ends] So just click on Rename, and the
new name has been set for this notebook. [Video description begins] The notebook's
name updates to ArrayCreation and the notebook area now contains multiple
cells. [Video description ends]
So we will run this pip install command, and those familiar with Jupyter Notebooks
will recall that you can execute shell commands from a notebook cell by preceding it
with the exclamation point. [Video description begins] From a cell in the notebook
area, he runs the command: !pip install numpy --upgrade. The output is: Requirement
already up-to-date: numpy in /anaconda3/lib/python3.7/site-packages
(1.15.2). [Video description ends] So on running this command, I get the message that
I already have the latest version of NumPy, but if you don't already have it then you
will see all the installation steps here.
Once the install is complete, we are ready to import this NumPy package into our
notebook. And we will be referencing this package using the alias np. [Video
description begins] He runs the command: import numpy as np. [Video description
ends]
We now go ahead and create our first NumPy array. We call this array_one. And this
array will be initialized with a Python list containing all the one digit odd numbers.
We also make use of the np.array function, in order to initialize this array with this
Python list, [Video description begins] He adds two lines of code. The first line is:
array_one = np.array([0,1,3,5,7,9]). The second line is: array_one. [Video
description ends] and on running that we will view the contents of array_one to see
what this NumPy array will look like. So when we run the cell, we have successfully
created our very first NumPy array. [Video description begins] He runs the two lines
of code that were added and the output is: array([0,1,3,5,7,9]). [Video description
ends] As you can see here, it contains the exact same elements as the Python list
which we passed to it. Except that this is not a list but a NumPy array object.
We now go ahead and create our second array, called array_two. [Video description
begins] He adds three lines of code. The first line is: num = [11, 22, 33, 44, 55, 66,
77]. The second line is: array_two = np.array(num). The third line is:
array_two. [Video description ends] And for that, we will be initializing it with a list,
which we passed as a variable. [Video description begins] He highlights the following
section in the code: [11, 22, 33, 44, 55, 66, 77]. [Video description ends] This list,
which contains the first seven multiples of 11, is assigned to a variable named num.
Which we then pass on to this np.array function in order to create our second NumPy
array. [Video description begins] He highlights np.array(num) in the code. [Video
description ends] And on running the cell, our second array has also been created.
[Video description begins] He runs the three lines of code that were added and the
output is: array([11, 22, 33, 44, 55, 66, 77]). [Video description ends]
We now move along and explore some of the other NumPy functions which are
available in order to initialize arrays. And one of these is the np.zeros function. [Video
description begins] He adds two lines of code. The first line is: array_of_zeroes =
np.zeros((2,3)). The second line is: array_of_zeroes. [Video description ends] This
will create an array whose elements are all zeroes. And the size of this array is
determined by the argument which is passed to it. In this case, we are creating an
array of two rows and three columns, and for that we're passing a tuple which contains
the values 2 and 3. So we just run the cell in order to create our array of zeroes,
[Video description begins] He runs the code and the output displays an array with
two rows and three columns. The first row is: array([[0., 0., 0.],. The second row is:
[0., 0., 0.]]). [Video description ends] and we see here that the array with two rows
and three columns have been created.
Similar to the np.zeros function, there is an np.ones function which will create an
array whose elements are all 1. [Video description begins] He adds two lines of code.
The first line is: array_of_ones = np.ones((3,2)). The second line is:
array_of_ones. [Video description ends] Just like with the zeroes function, the
argument to np.ones is the size of the array, which is also known as the array shape,
and in this case, we create an array with three rows and two columns. [Video
description begins] He highlights the following section in the code:
np.ones((3,2)). [Video description ends] So we just run the cell and our array of ones
has also been generated. [Video description begins] He runs the code and the output
displays an array with three rows and two columns. The first row is: array([[1., 1.],.
The second row is: [1., 1.],. The third row is: [1., 1.]]). [Video description ends]
Note here that in our array of zeroes, as well as our array of ones, each of the elements
are of type float. This you can tell by the decimal, which appears right after the digits.
[Video description begins] In the array of zeroes and the array of ones in the output,
he points to the dot that displays after the digits 0 and 1. [Video description ends] So
if you would like to create an array of integers instead, then we simply need to pass
the dtype argument when using the np.ones or np.zeros function. [Video description
begins] He adds two lines of code. The first line is: array_of_ones = np.ones((3,2),
dtype = np.int32). The second line is: array_of_ones. [Video description ends]
So here we simply recreate our array of ones, [Video description begins] He runs the
code and the output displays an array with three rows and two columns. The first row
is: array([[1, 1],. The second row is: [1, 1],. The third row is: [1, 1]]),
dtype=int32). [Video description ends] but this time this is an array of integers, rather
than floats. [Video description begins] He highlights the following section in the
output: dtype=int32). [Video description ends]
NumPy also provides a function called np.empty, which will create a new array of the
given shape. [Video description begins] He adds two lines of code. The first line is:
array_empty = np.empty((3,2)). The second line is: array_empty. He highlights the
following section: np.empty((3,2)). [Video description ends] However, the elements of
this array will not be initialized to any value. As a result, this operation will run
quicker than any of the other array initialization functions. However, the elements of
the array will need to be explicitly set at a later point. So when we run this function,
[Video description begins] He runs the code and an array of three rows and two
columns displays as the output. The first row is: array([[1., 1.],. The second row is:
[1., 1.],. The third row is: [1., 1.]]). [Video description ends] we notice that each of
the elements here is one, and this is because that value was already present in the
memory which was allocated for these elements.
We now explore the nv.i function, which will create a square identity matrix of the
specified size. So this will generate a matrix of four rows and four columns, [Video
description begins] The Jupyter Notebook user interface is open in a web browser and
the ArrayCreation notebook displays. The presenter runs two consecutive lines of
code. The first line is: array_eye = np.eye(4). The second line is: array_eye. [Video
description ends] where all the diagonal elements are 1, while all the other elements
are 0. [Video description begins] The output displays an array of four rows and four
columns. The first row is: array([[1., 0., 0., 0.],. The second row is: [0., 1., 0., 0.].
The third row is: [0., 0., 1., 0.]. The fourth row is: [0., 0., 0., 1.]]). [Video description
ends]
Another NumPy function is the arange operation, which allows us to create an array
with a specified range and an interval. [Video description begins] He adds two lines of
code. The first line is: array_of_evens = np.arange(2,24,2). The second line is:
array_of_evens. [Video description ends] So the arguments for this function are the
lower bound of the array, the upper bound, and the interval. So in this case our array
will begin with the element 2. And each element will be 2 greater than the previous
one until the number 24 is hit. [Video description begins] He highlights
np.arange(2,24,2). [Video description ends] So when we execute the cell, our array of
even numbers between 2 and 22 have been created. [Video description begins] He
runs the code and the output is: array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22]). [Video
description ends]
The arange function does not only take integers as arguments, but can also be used
with floating point values. Here, we create an array beginning with 1 and going up to
3, with each of the elements being an increment of 0.3 over the previous one. [Video
description begins] He adds two lines of code. The first line is: array_of_floats =
np.arange(1, 3, 0.3). The second line is: array_of_floats. He highlights np.arange(1,
3, 0.3). [Video description ends] So on running the cell, our array of floats has been
created. [Video description begins] He runs the code and the output is: array([1., 1.3,
1.6, 1.9, 2.2, 2.5, 2.8]). [Video description ends]
We now look at how we can create a two dimensional array using a list of Python
tuples. So we once again use the np.array function and we pass in this list of tuples.
[Video description begins] He adds two lines of code. The first line is: array_2D =
np.array([(3,4,6),(2,1,6)]). The second line is: array_2D. He highlights the section:
.array([(3,4,6),(2,1,6)]). [Video description ends] And each of the tuples now
represents a row in this 2D array. So when we run this command, our array of two
rows and three columns has been created. [Video description begins] He runs the
code and the output displays an array of two rows and three columns. The first row
is: array([[3, 4, 6],. The second row is: [2, 1, 6]]). [Video description ends]
I had previously mentioned that the size of an array, in terms of the number of rows
and columns, is referred to as the array shape. So if you'd like to view the shape of a
NumPy array, we can make use of the shape property of the array object. [Video
description begins] He adds a line of code: array_2D.shape. [Video description ends]
Here we check the shape of the 2D array, which we just created. [Video description
begins] He runs the code and the output is: (2, 3). [Video description ends] And we
can see here that this is an array of 2 rows and 3 columns, which is represented as a
tuple.
We have previously used the np.arange function in order to create a one dimensional
array with a specified lower bound, upper bound, and interval. [Video description
begins] He adds a line of code: np.arange(8). [Video description ends] If you simply
pass in one value to this arange function, this argument is treated as the upper bound.
And the lower bound is assumed to be 0, and the interval is assumed to be 1. So when
we run np.arange with an argument of 8, it will create an array for us beginning with
0, and going up to 7, with an increment of one. [Video description begins] He runs
the code and the output is: array([0, 1, 2, 3, 4, 5, 6, 7]). [Video description ends]
Now that we have a one dimensional array of eight elements, what if we would like to
create a two dimensional array out of this? For that, each NumPy array object has a
reshape function which will effectively reorder the elements of the array in the desired
shape. [Video description begins] He adds two lines of code. The first line is:
array_nd = np.arange(8).reshape(2,4). The second line is: array_nd. [Video
description ends] So our array of eight elements, here, can be reshaped into an array
of two rows and four columns. [Video description begins] He highlights .reshape(2,4)
in the code. [Video description ends] So when we execute this cell, we note that this
reshaping has taken place successfully. [Video description begins] He runs the code
and an array of two rows and four columns displays as the output. The first row is:
array([[0, 1, 2, 3],. The second row is: [4, 5, 6, 7]]). [Video description ends]
One caution when using this reshape operation is that an array needs to have the right
number of elements in order to be reordered into a specified shape. So, in this case, if
we try to reshape an array of six elements to one containing two rows and four
columns, that is, eight elements, then the operation will result in an error. [Video
description begins] He runs the following line of code: array_nd =
np.arange(6).reshape(2,4). The output displays an error message, which includes the
line: ValueError: cannot reshape array of size 6 into shape (2,4). [Video description
ends] So this reshape operation needs to be used with caution. And ensure that the
array being reshaped does have the required number of elements in order to be
reordered. [Video description begins] He highlights the np.arange value of (6) and the
reshape value of (2,4) in the code that was executed. [Video description ends]
The last function we will view in this demo will make use of this array_nd array
which we see here has two rows and four columns. That is, it has a shape of 2, 4.
[Video description begins] He runs the code: array_nd. The output displays an array
of two rows and four columns. The first row is: array([[0, 1, 2, 3],. The second row
is: [4, 5, 6, 7]]). [Video description ends]
Now, the function which we will explore here is called np.ones_like. [Video
description begins] He adds two lines of code. The first line is: array_ones =
np.ones_like(array_nd). The second line is: array_ones. [Video description ends] This
is similar to the np.ones operation in that it creates an array whose elements are all
equal to one. But the argument to this array is not a tuple representing the array shape.
But is in fact another NumPy array whose shape will simply be taken up. So this
operation will create an array of ones whose shape is the same as the array_nd array.
[Video description begins] He highlights the following section in the code:
np.ones_like(array_nd). [Video description ends]
So on running this cell, [Video description begins] He runs the code and an array of
two rows and four columns displays as the output. The first row is: array([[1, 1, 1,
1],. The second row is: [1, 1, 1, 1]]). [Video description ends] we see that this array of
ones with two rows and four columns, just like array_nd, has been created. [Video
description begins] He points to the array in the previous output. [Video description
ends]
Printing Arrays
[Video description begins] Topic title: Printing Arrays. Your host for this session is
Kishan Iyer. [Video description ends]
In this demo, we will examine all the options available for us in order to view the
contents of a NumPy Array.
The simplest way to view the contents of the array is to use the Python print function.
So we just print the contents of x here, [Video description begins] He runs the code:
print(x). The output is: [0 1 2 3 4 5 6 7]. [Video description ends] and the elements
are now visible, just as you could see a Python list.
Moving on now to a 3D array, which we once again create using the reshape
operation. [Video description begins] He adds two lines of code. The first line is: z =
np.arange(36).reshape(3,4,3). The second line is: print(z). [Video description ends]
And do note here that we can pass three arguments for the length of each of the three
dimensions in our array. And we once again use the Python print function in order to
print out the contents. [Video description begins] He runs the code and the output
displays an array containing three sets of four rows and three columns. The array
displays values from zero to 35, in succession. For example, in the first set, the first
row is: [[[ 0 1 2]. The second row is: [ 3 4 5]. The third row is: [ 6 7 8]. The fourth
row is: [9 10 11]]. [Video description ends] And this array takes the shape (3,4,3),
and it's essentially an array of two dimensional arrays.
For the arrays we have viewed so far, all the elements have been fully visible.
However, what if we go ahead and create a rather large array? This one contains
12,100 elements. [Video description begins] He adds a line of code:
np.arange(12100). [Video description ends] And when we print this out, we notice
that all the elements are not visible. [Video description begins] He runs the code and
the output is: array([ 0, 1, 2, ..., 12097, 12098,12099]). [Video description ends] But
in fact these three dots, which are also referred to as an ellipsis, [Video description
begins] He highlights the ellipsis in the output. [Video description ends] is being used
in order to indicate that there are elements between the ones which are visible on
screen.
The ellipsis also appears when we print out two dimensional arrays, which we can
observe when we simply reshape the previous array to a shape of 110 rows and 110
columns. [Video description begins] He adds a line of code:
print(np.arange(12100).reshape(110,110)). [Video description ends] And we observe
from the output that the ellipsis now appears for both the rows as well as the columns.
[Video description begins] He runs the code and the output displays an array of two
sets, separated by an ellipsis. The sets consist of three rows and seven columns each.
For example, in the first set, the first row is: [[ 0 1 2 ... 107 108 109]. The array
displays values from zero to 12099. He highlights the ellipses in the array, which
displays in column four of all rows, as well as between the two sets. [Video
description ends]
However, if you would like to view the entire contents of the array, irrespective of
how large it is, then we need to make use of np.set_printoptions function. Which
allows us to specify how floating point numbers, arrays, and other NumPy objects are
displayed. [Video description begins] He adds a line of code:
np.set_printoptions(threshold = np.nan). [Video description ends]
We call this function by passing in a value for the threshold argument, which sets the
number of array elements for which ellipsis would begin to be used. In order to
eliminate this threshold, we need to set it to a null value. And for that we make use of
a special NumPy constant called nan, which stands for not a number. [Video
description begins] He highlights the following section in the code: (threshold =
np.nan). A note reads: This will eliminate the threshold at which NumPy uses ellipsis
for large arrays. [Video description ends]
So when we set this value for the threshold, [Video description begins] He runs the
code. [Video description ends] we can just go ahead and reprint our array rows of 110
rows, and 110 columns. [Video description begins] He runs the code:
print(np.arange(12100).reshape(110,110)). The output displays the full contents of
the array, which consists of multiple rows and columns. There are no ellipses in this
instance. The first row in the array contains the values zero to 109 and the second
row contains the values 110 to 219, in succession. [Video description ends] And on
this occasion, the entire contents of this array, which is 12,100 elements, can be
viewed. And of course, we'll need to scroll down in the output in order to view the full
contents. [Video description begins] The last row in the array displays the values
11990 to 12099, in succession. [Video description ends] Note that the default value
for the threshold used by NumPy is 1,000.
We now go ahead and print an array of 10 rows and 10 columns. And, as expected,
the entire contents of this array is visible on screen. [Video description begins] He
runs the code: print(np.arange(100).reshape(10,10)). The output displays an array of
ten rows and ten columns, comprising of values from zero to 99, in succession. The
first row is: [[ 0 1 2 3 4 5 6 7 8 9]. [Video description ends]
Now, we can go ahead and set the print options, once more, where we set a value of
50 for the threshold. [Video description begins] He adds two lines of code. The first
line is: np.set_printoptions(threshold = 50). The second line is:
print(np.arange(100).reshape(10,10)). [Video description ends]
So if we reprint this array of 100 elements, [Video description begins] He runs the
code and the output displays an array containing two sets, separated by an ellipsis,
and comprising of values from zero to 99. Each set contains three rows and seven
columns, of which ellipses display in column four of all rows. For example, the first
row in the first set is: [[ 0 1 2 ... 7 8 9]. [Video description ends] we will now see the
ellipses once more. [Video description begins] He highlights the series of ellipses in
column four and the ellipsis between the two sets in the array. [Video description
ends]
Do note that once we execute the set_printoptions function, those options will apply
for all the subsequent executions which we perform. And with that, we now come to
the end of this demo on printing NumPy arrays.
We begin once more by importing the NumPy package first. [Video description
begins] The Jupyter Notebook user interface is open in a web browser and the
BasicOperations notebook displays. The presenter runs the command: import numpy
as np. [Video description ends] And we will first initialize two arrays called p and q.
And you can see, each of these are one-dimensional arrays containing three elements.
[Video description begins] He runs two consecutive lines of code. The first line is: p =
np.array([9,8,7]). The second line is: q = np.array([3,2,4]). [Video description ends]
The first operation between these two arrays which we will perform is an addition
operation, and for that we simply use the plus sign. So what does p + q give us?
[Video description begins] He runs the code: p + q. The output is: array([12, 10,
11]). A note reads: Element-wise addition: [9+3, 8+2, 7+4]. [Video description ends]
As we can see here, this operation has effectively performed an element-by-element
addition. So the shape of the resultant array is exactly the same as that of each of the
operands. [Video description begins] He points to ([9,8,7]) and ([3,2,4]) in this code
that was previously executed: p = np.array([9,8,7]) q = np.array([3,2,4]). [Video
description ends] So when performing a plus operation between two arrays of the
same shape, then NumPy assumes that we want to perform an element-by-element
addition.
The same thing also applies when we perform a subtraction between the two arrays.
So p - q results in an array containing 6, 6, and 3, [Video description begins] He runs
the code: p - q. The output is: array([6, 6, 3]). [Video description ends] which is an
element-by-element subtraction. In fact, even other mathematical operations, such as
multiplication, division, and the modulus operation, work in exactly the same way in
NumPy. [Video description begins] He adds three lines of code. The first line is:
print( 'p * q = ', p * q). The second line is: print( 'p / q = ', p / q). The third line is:
print( 'p % q = ', p % q). He runs the code and the output is: p * q = [27 16 28] p / q
= [3. 4. 1.75] p % q = [0 0 3]. [Video description ends]
Now, what if we tried to perform an operation between a NumPy array and a scalar
quantity? So what if we perform a modulus operation between the array p and the
number 2? Well, in this case, NumPy is smart enough to assume that we wish to
perform a modulus of two operation on each of the elements in the array p. [Video
description begins] He runs the code: p % 2. The output is : array([1, 0, 1]). [Video
description ends] And in fact, this is exactly the output which is generated. So this
output has the shape of the array p.
The same thing also applies when we perform boolean operations. So if you would
like to check whether the elements of array p are greater than 8, then we get an array
of true or false values, which has the same shape as the array p. [Video description
begins] He runs the code: p > 8. The output is: array([ True, False, False]). [Video
description ends] And the same thing also applies when we perform a less than
operation. [Video description begins] He runs the code: p < 9. The output is:
array([ False, True, True]). [Video description ends]
We now go ahead and perform an addition operation. And once more, an element-by-
element addition has been performed. [Video description begins] He runs the code: x
+ y. The output is: array([[5, 3], [5, 5]]). [Video description ends] The same thing
also happens when we perform x minus y, where each element of y is subtracted from
the corresponding element of x. [Video description begins] He runs the code: x - y.
The output is: array([[-1, -1], [-3, 1]]). [Video description ends] And even the
multiplication operation happens on an element-by-element basis. [Video description
begins] He runs the code: x * y. The output is: array([[6, 2], [4, 6]]). [Video
description ends]
Another NumPy operation which we can perform is a dot product between two arrays.
And for that, we can call the dot function, which belongs to an array object. So this
will generate the dot product between x and y. [Video description begins] He runs the
code: x.dot(y). The output is: array([[10, 6], [15, 8]]). [Video description ends]
And there is another way in which we can perform a dot product, and that is by calling
the np.dot function. [Video description begins] He adds the code: np.dot(x,y). [Video
description ends] So the dot function belongs to an array object, but it's also part of
the standard NumPy library. While the dot function belonging to an array object takes
in one argument, the np.dot function takes in two arguments. However, as we can see
here, the outputs are identical. [Video description begins] He runs the code and the
output displays the same array as for the code that was executed prior. [Video
description ends]
So each of the array operations we have performed so far have created a new array
containing a result. However, we can also use these mathematical operators in order to
modify the contents of an array. Here, we make use of the multiplication operation in
order to multiply each of the array elements by 4. [Video description begins] He adds
two lines of code. The first line is: x*= 4. The second line is: x. [Video description
ends] So when we execute the cell, we see here that the element-by-element operation
has been performed. [Video description begins] He runs the code and the output is:
array([[ 8, 4], [ 4, 12]]). [Video description ends]
Now, what if we want to sum up the mileage of all the vehicles in our inventory? For
that, NumPy community provides us with a sum function which belongs to every
NumPy array object. [Video description begins] He runs the code and then adds the
following code line: fleet_mileage.sum(). [Video description ends] So when we just
call that, we get the sum of all the elements in a NumPy array. [Video description
begins] He runs the code and the output is: 119606. He highlights the following
section in the code that was previously executed: ([14130, 37234, 21892, 11479,
6890, 27981]). [Video description ends]
There is also a min function, which will return the minimum value in a NumPy array.
[Video description begins] He runs the code: fleet_mileage.min(). The output is 6890
and he highlights the value of 6890 in the code that was executed earlier:
fleet_mileage = np.array([14130, 37234, 21892, 11479, 6890, 27981]). [Video
description ends] And correspondingly, there is a max function as well, so this returns
the highest value in our array. [Video description begins] He runs the code:
fleet_mileage.max(). The output is 37234 and he highlights the same value, 37234, in
the code that was executed earlier: fleet_mileage = np.array([14130, 37234, 21892,
11479, 6890, 27981]). [Video description ends]
Now, what if we wish to calculate the average mileage of all of the vehicles in our
fleet? For that, there is the mean function which we can make use of. [Video
description begins] He adds a line of code: print('Mean: ',
fleet_mileage.mean()). [Video description ends] And we're just going to use this as
part of the print statement. So the mean mileage of all the vehicles in our fleet is about
20,000 miles. [Video description begins] He runs the code and the output is: Mean:
19934.333333333332. [Video description ends]
So these were aggregate operations performed on a one-dimensional array. How
exactly would they work in two dimensions, though? [Video description begins] He
adds two lines of code. The first line is: num = np.arange(16).reshape(4,4). The
second line is: num. [Video description ends]
For that, we create a two-dimensional array with four rows and four columns. [Video
description begins] He highlights .reshape(4,4) and then runs the code. The output
displays an array of four rows and four columns, displaying values from zero to 15, in
numerical order. The first row is: array([[ 0, 1, 2, 3],. [Video description ends] And
when we run the sum function on this array, it simply sums up the values of each of
the elements and returns us a single value. [Video description begins] He runs the
code: num.sum(). The output is: 120. [Video description ends]
So this may be of use, but when we're working with two dimensions, we would also
like to sum up values across the individual dimensions. To do that, when we call the
sum function, we simply pass along a value for the axis argument. Here we specify
that we would like all the values to be summed up for each of the columns. [Video
description begins] He adds a line of code: num.sum(axis = 0). [Video description
ends] And the result is a one-dimensional array, whose size corresponds to the number
of rows in this array and simply sums up the values in each of the columns. [Video
description begins] He runs the code and the output is: array([24, 28, 32, 36]). [Video
description ends]
If we would like to sum up the values in each of the rows in the array, we simply call
the sum function and set the value of the axis argument to 1. [Video description
begins] He adds a line of code: num.sum(axis = 1). [Video description ends] And this
will return an array with the sum of each row, and the size will correspond to the
number of columns. [Video description begins] He runs the code and the output is:
array([ 6, 22, 38, 54]). [Video description ends]
We can perform aggregations for individual dimensions, even using the other
aggregate functions. [Video description begins] He adds a line of code: num.min(axis
= 1). [Video description ends] So here we get the minimum value in each row of our
array, and the size once more corresponds to the number of columns. [Video
description begins] He runs the code and the output is: array([ 0, 4, 8, 54]). [Video
description ends]
And finally, we confirm that we can also perform the mean operation for each of the
rows in a two-dimensional array. [Video description begins] He adds a line of code:
num.mean(axis = 1). [Video description ends] And once more, this returns the
expected output. [Video description begins] He runs the code and the output is:
array([ 1.5, 5.5, 9.5, 13.5]). [Video description ends]
And with that, we have covered a number of the basic operations which can be
performed on both one-dimensional as well as two-dimensional NumPy arrays.
Universal Functions
[Video description begins] Topic title: Universal Functions. Your host for this session
is Kishan Iyer. [Video description ends]
We have previously seen how certain operations can be performed on NumPy arrays
on an element-by-element basis. In this demo, we will be covering universal
functions, which are NumPy library functions, which operate on an element-by-
element basis on NumPy arrays. [Video description begins] The Jupyter Notebook
user interface is open in a web browser and the UniversalFunction notebook
displays. [Video description ends]
In this notebook, we once again begin by importing the NumPy package. [Video
description begins] The presenter runs the command: import numpy as np. [Video
description ends] And we will first create a NumPy array containing the radius values
for a number of circles. [Video description begins] He adds a line of code:
circle_radii = np.array([145,120, 90, 60, 45, 30]). [Video description ends] And once
we have this array of radii, we perform a mathematical operation which will apply to
each of the individual elements in this array. [Video description begins] He runs the
code and then adds two lines of code. The first line is: circle_diameters = 2 *
circle_radii. The second line is: circle_diameters. [Video description ends]
So here, we are making use of the constant np.pi, which is provided by NumPy. We
are also making use of Python's power operator in order to get the square of the radii
from the elements in the radii array. So we have just covered, once more, how we can
use mathematical operators in order to perform operations on an element-by-element
basis on NumPy arrays. [Video description begins] He runs the code and the output
displays an array listing circle area values. For example, the first value in the array
is: 66051.98554173. [Video description ends]
However, we can also make use of NumPy functions, which work on an element-by-
element basis. And to demonstrate that, we first define an array which contains a set
of angles expressed in degrees. [Video description begins] He runs the following line
of code: angles_degrees = np.array([0, 30, 60, 90, 120, 150, 180]). [Video
description ends]
In order to express each of these angles in the unit of radians, we once more perform a
mathematical operation on each of the individual elements of our angle_degrees array.
[Video description begins] He adds two lines of code. The first line is:
angle_radians= angles_degrees * np.pi / 180. The second line is:
angle_radians. [Video description ends] And this operation will result in an array
containing those same angles expressed in radians. [Video description begins] He
runs the code and the output displays an array comprising of angle radian values.
For example, the first value is: 0.. [Video description ends]
Now, the first universal function which we will make use of is the np.sin function. It
requires at least one argument which is a NumPy array, and this will calculate the sine
value for each of the elements in that NumPy array. [Video description begins] He
adds two lines of code. The first line is: print('Sine values: '). The second line is:
np.sin(angle_radians). [Video description ends] That is, it performs an element-by-
element operation. So when we execute the cell, [Video description begins] He runs
the code and the output displays an array of sine values. [Video description ends]we
are presented with the sine values for each of the angles expressed as radians in our
array. [Video description begins] For example, the first sine value in the array is:
0.00000000e+00. [Video description ends]
Similar to the sine function, NumPy also provides a cos function in order to calculate
the cosine values for each element in a NumPy array. [Video description begins] He
runs two consecutive lines of code. The first line is: print('Cosine values: '). The
second line is: np.cos(angle_radians). The output displays an array of cosine values.
In this instance, the first cosine value in the array is: 1.00000000e+00. [Video
description ends] And correspondingly, there is also an np.tan function available in
the NumPy library. [Video description begins] He adds two lines of code. The first
line is: print('Tangent values: '). The second line is: np.tan(angle_radians). [Video
description ends] As you will have noticed, each of these functions requires the angles
to be expressed in radians, by default. [Video description begins] He runs the code
and the output displays an array of tangent values. In this instance, the first tangent
value is: 0.00000000e+00. [Video description ends]
Now that we have covered the basic trigonometric functions which are available, we
can also go ahead and check for the inverse trigonometric functions. To do that, we
first create an array, which contains all of the sine values. And for that, we once again
use the np.sin function. [Video description begins] He adds a line of code:
sine_values = np.sin(angle_radians). [Video description ends] And with these
sine_values, [Video description begins] He runs the code. [Video description ends]
the offset is an argument to the np.arcsin function, which will calculate the arcsine or
the sine inverse of each of these sine_values. [Video description begins] He adds two
lines of code. The first line is: arcsine_values = np.arcsin(sine_values). The second
line is: arcsine_values. [Video description ends] So when we execute the cell, [Video
description begins] He runs the code and the output displays an array of arcsine
values. In this instance, the first value is: 0.00000000e+00. [Video description ends]
we note here that each of the angles have been recovered from all the sine values. But
once again, these are expressed in the form of radians.
We can view the values in degrees for each of these angles by multiplying each of
these arcsine_values by 180 and then dividing by pi. [Video description begins] He
adds a line of code: arcsine_values * 180 / np.pi. He highlights the following code
that was executed previously: arcsine_values. [Video description ends] And from the
output here, [Video description begins] He runs the code and the output is:
array([0.0000000e+00, 3.0000000e+01, 6.0000000e+01, 9.0000000e+01,
6.0000000e+01, 3.0000000e+01, 7.0167093e-15]). [Video description ends] we
notice that all the original angles between 0 and 90 have been recovered. But since the
sine of 120 degrees is the same as sine of 60, and the sine of 150 is equal to the sine of
30, we see those values listed in this array.
We now move on to some other universal functions for which we will create a new
array x using the np.arange function. [Video description begins] He adds two lines of
code. The first line is: x= np.arange(4, 31, 5). The second line is: x. [Video
description ends] So this contains an array of these six elements. [Video description
begins] He runs the code and the output displays as follows: array([ 4, 9, 14, 19, 24,
29]). [Video description ends] And the universal function we explore right now is the
exponential, or exp function. [Video description begins] He adds two lines of code.
The first line is: expo = np.exp(x). The second line is: expo. [Video description ends]
When we apply this exponential function to the array x, the output will result in an
array containing the values of e to the power x. [Video description begins] He
highlights np.exp(x) in the code. He also highlights the element, x, in the code that
was previously executed. A note reads: expo = e^x. [Video description ends] That is,
each of the elements in the array x will be applied as an exponent to the Euler's
constant, or Euler's constant, e. The value of e is about 2.718, and if you can see here,
this array contains a number of rather large values. [Video description begins] He
runs the code and the output is: array([5.45981500e+01, 8.10308393e+03,
1.20260428e+06, 1.78482301e+08, 2.64891221e+10, 3.93133430e+12]). [Video
description ends]
Another universal function which NumPy provides us is the square root function,
which is sqrt. [Video description begins] He adds two lines of code. The first line is:
sqrt = np.sqrt(x). The second line is: sqrt. [Video description ends] And this will
calculate the square root for the elements in a NumPy array. [Video description
begins] He runs the code and the output displays an array of square root
values. [Video description ends]
There is also the median function, which will calculate the median value in an array.
[Video description begins] He adds a line of code: np.median(x). [Video description
ends] So this will sort the array and pick the middle value, if the number of elements
is odd, or will calculate the average of the middle two elements, if the array has an
even length. In our case, our array contains six elements and the middle two elements
are 14 and 19, which is why the median is calculated as 16.5. [Video description
begins] He runs the code and the output displays a value of 16.5. [Video description
ends]
Finally, we will perform a number of aggregate operations on a rather large array. For
that, we load a bunch of float values from a data set. And to do that, we make use of
the genfromtxt function. So this will generate a NumPy array from a text file. The
arguments we pass to it include the location of the file, which is in our datasets
directory, and it's called float_values.csv. And we can also specify what the delimiter
is between each of the values in that file. In our case, it's a comma. [Video description
begins] He adds a line of code: float_values =
np.genfromtxt('datasets/float_values.csv', delimiter =' , '). [Video description ends]
So once we run the cell, we can take a look at the contents of this float_values array.
[Video description begins] He runs the code. Then he runs the following code line:
float_values. The output is: array([0.8944, 0.8898, 0.8894, ..., 0.7051, 0.6992,
0.7047]). [Video description ends] And we notice here that this contains a number of
elements, which is indicated by the ellipsis. We can then take a look at the shape of
this array, and we observe here that it contains almost 6,000 elements. [Video
description begins] He runs the code: float_values.shape. The output is:
(5998, ). [Video description ends]
So we now have a rather large one-dimensional array. And we will use it in order to
perform a number of aggregate operations. These include the calculations for the
mean, median, variance, and standard deviation for all the values in our array. [Video
description begins] He adds four lines of code. The first line is: print('Mean = %i'
%np.mean(float_values)). The second line is: print('Median = %i'
%np.median(float_values)). The third line is: print('Variance = %i'
%np.var(float_values)). The fourth line is: print('Standard Deviation = %i'
%np.std(float_values)). [Video description ends] And the aggregate function we have
not come across earlier includes the var function, for variance, and the std function,
for the standard deviation. So on running the cell, [Video description begins] He runs
the code and the output is: Mean = 16 Median = 5 Variance = 1659 Standard
Deviation = 40. [Video description ends] we are able to get these aggregate values for
this rather large one-dimensional array.
So the powers of three array is now ready for us. [Video description begins] He runs
the code and the output is: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729]). [Video
description ends] And the first operation we will perform is to access an element of
this array by the index. So just like with Python lists, we use the square bracket
notation and the element at index 3 is the cube of 3. That is 27. [Video description
begins] He runs the code: x[3]. The output displays the value: 27. He highlights the
value of 27 in the previous output: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512,
729]). [Video description ends]
Now, just like with Python, we can also access elements from the end of the array. So
using an index of -4, we get the fourth from last element of our array, which is 216.
[Video description begins] He runs the code: x[-4]. The output displays the value:
216. He highlights the value of 216 in the previous output: array([ 0, 1, 8, 27, 64,
125, 216, 343, 512, 729]). [Video description ends] So that covers the basic indexing
of NumPy arrays.
We now perform a slice of this array. So here we get the slice of elements from index
1 through to index 7. So these should return the cubes of all the numbers from 1
through 7. [Video description begins] He runs the code: x[1 : 8]. The output is:
array([ 1, 8, 27, 64, 125, 216, 343]). He highlights a section of corresponding values
in the following output: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729]). The section
is: 1, 8, 27, 64, 125, 216, 343. [Video description ends] We can also use negative
indexing while slicing the array. So this will give us all the values from index three
until the fourth from last index of the array. [Video description begins] He runs the
code: x[3 : -3]. The output is: array([ 27, 64, 125, 216]). [Video description ends]
If we do not specify a starting index for our slice, then NumPy will assume that the
starting index is 0. So this returns the first seven elements in our array. [Video
description begins] He runs the code: x[ : 8]. The output is: array([ 0, 1, 8, 27, 64,
125, 216, 343]). [Video description ends] And similarly, if we do not explicitly set the
end index, it is assumed to be the last element in the array. [Video description
begins] He runs the code: x[3 : ]. The output is: array([ 27, 64, 125, 216, 343, 512,
729]). [Video description ends] So this will give us the slice of the array beginning
from index 3 through to the end of the array.
NumPy also allows the use of the start, stop, and step notation in order to get a slice of
the array. So the indexes of this slice of the array will include the start of the array,
and will go on until the index of 9, in increments of 2. [Video description begins] He
adds the following line of code: x[ : 10: 2]. [Video description ends] So the slice
contains the elements of an array at index locations 0, 2, 4, 6, and 8. [Video
description begins] He runs the code and the output is: array([ 0, 8, 64, 216,
512]). [Video description ends]
When using this notation, we can also specify a negative step value. And this
particular slice will contain the elements of the original array, only in reverse order.
[Video description begins] He runs the code: x[ ::-1]. The output is: array([729, 512,
343, 216, 125, 64, 27, 8, 1, 0]). [Video description ends]
In order to access the first row of the array, we can use a single index, that is, we
access the element at index 0, and this returns the row containing all the company
names. [Video description begins] He runs the code: companies[0]. The output is:
array([ 'Samsung', 'Microsoft', 'IBM', 'Spotify', 'Flipkart' ], dtype='<U9' ). [Video
description ends]
If you would like to get the data from a specific column, we need to specify a multi-
dimensional slice. So for the first dimension, we just say that we would like all of the
rows to be picked up. But for the column dimension, we say that we only want the
column at index 2. [Video description begins] He adds the following line of code:
companies[:, 2]. [Video description ends] Running this will return the details of IBM,
which is at column number three, or the column at index two of our array. [Video
description begins] He runs the code and the output is: array([ 'IBM', '1911', '380000'
], dtype='<U9' ). [Video description ends]
If you would like to retrieve the values within a particular cell of our array, rather than
get a whole row or column. We can specify the location of that cell in terms of its row
and column index. So here, the cell, which is at row 0 and column number 2, contains
the text IBM. [Video description begins] He runs the code: companies[0, 2]. The
output is: 'IBM'. He highlights the same name, 'IBM', in the following line of code that
was executed earlier: companies = np.array([[ 'Samsung', 'Microsoft', 'IBM',
'Spotify', 'Flipkart' ],. [Video description ends]
So far when using this two-dimensional array, we had retrieve either a complete row
or a complete column. However, if you would only like a subset of rows and columns
to be returned, then we can use this notation, where we specify a range of row indices,
followed by a range of column indices. [Video description begins] He adds the
following line of code: companies[0:2, 2:4]. [Video description ends] So running this
will return a slice of the array, which includes the data in the rows corresponding to
index 0 and 1, and the columns, 2 and 3. [Video description begins] He runs the code
and the output is: array([[ 'IBM', 'Spotify' ], [ '1911', '2006' ]], dtype='<U9' ). [Video
description ends]
We now take a look at another slice of this array, which includes all of the rows and
columns 2 and 3. [Video description begins] He runs the code: companies[:, 2:4].
The output is: array([[ 'IBM', 'Spotify' ], [ '1911', '2006' ], ['380000', '3000' ]],
dtype='<U9' ). [Video description ends] So this returns all the details of IBM and
Spotify.
Negative indexing can also be used with two-dimensional arrays. So, here, we return
all the columns corresponding to the last row of our array. [Video description
begins] He runs the code: companies[-1, :]. The output is: array(['489000', '131000',
'380000', '3000', '30000']], dtype='<U9' ). [Video description ends] So this returns
the entire row containing the employee count for the companies.
And lastly, we explore the use of the ellipses when it comes to indexing and slicing.
So the ellipses in NumPy is used to denote the rest of the array. So in this case, we are
getting the elements at row 0 and we specify all the columns by using the ellipses
symbol. [Video description begins] He runs the code: companies[0, ...]. The output
is: array([ 'Samsung', 'Microsoft', 'IBM', 'Spotify', 'Flipkart' ], dtype='<U9' ). [Video
description ends] So this returns all the company names. And similarly, we can also
use the ellipsis in order to denote all the rows in the array. So this will return all the
rows corresponding to column with the index 2. That is the details of the company,
IBM. [Video description begins] He runs the code: companies[..., 2]. The output is:
array([ 'IBM', '1911', '380000' ], dtype='<U9' ). [Video description ends]
In this demo, we will explore various options which are available, in order to iterate
through arrays in NumPy. As always, we will import the NumPy package first. [Video
description begins] The Jupyter Notebook user interface is open in a web browser and
the Iterating notebook displays. The presenter runs the command: import numpy as
np. [Video description ends]
And the array, we will first work with contains the cubes of all numbers, beginning
from 0 through to 9. So for that, we use the np.arange function. [Video description
begins] He adds two lines of code. The first line is: x= np.arange(10)**3. The second
line is: x. [Video description ends] And once we have this array, [Video description
begins] He runs the code and the output is: array([ 0, 1, 8, 27, 64, 125, 216, 343, 512,
729]). [Video description ends] we will iterate through this array first by using the for
loop. [Video description begins] He adds two lines of code. The first line is: for i in
x:. The second line is: print(i). [Video description ends]
So we will capture in the variable i the element which is returned in each iteration of
the for loop, and then print it out. As anyone familiar with Python lists would expect,
each of the elements of our array is what is assigned to the variable i at each iteration.
[Video description begins] He runs the code and the output displays a list of values,
as follows: 0 1 8 27 64 125 216 343 512 729. [Video description ends]
We will now make things a little more complex, where we will create a two-
dimensional array. [Video description begins] He adds three lines of code. The first
line is: companies = np.array([[ 'Samsung', 'Microsoft', 'IBM', 'Spotify', 'Flipkart' ],.
The second line is: [1938, 1975, 1911, 2006, 2007],. The third line is: [489000,
131000, 380000, 3000, 30000]]). [Video description ends] This contains three rows
and five columns. And contains information about five different companies, including
their names, the year of their founding, and the number of people they employ.
So when we try to iterate over this 2D array, [Video description begins] He runs the
code. [Video description ends] each iteration will return a row of the array. We create
a variable i, which we initialize to 0, and this will denote each iteration of the array.
[Video description begins] He adds four lines of code. The first line is: i = 0. The
second line is: for row in companies:. The third line is: print( 'Row', i, ': ',row). The
fourth line is: i +=1. [Video description ends] So once we execute this for loop, we
get the confirmation that each iteration returns one row of this 2D array. [Video
description begins] He runs the code and three lines of output displays. The first line
is: Row 0 : [ 'Samsung' 'Microsoft' 'IBM' 'Spotify' 'Flipkart' ]. The second line is: Row
1 : ['1938' '1975' '1911' '2006' '2007']. The third line is: Row 2 : ['489000' '131000'
'380000' '3000' '30000']. [Video description ends]
We will now perform a flatten operation on our 2D array. So, the flatten function is
something which will reduce a 2D array into a single dimension. And this will be
done row-wise. [Video description begins] He adds a line of code:
companies.flatten(). [Video description ends]
To understand what that means we can just execute the cell. [Video description
begins] He runs the code and the output displays an array in which all the values of
the initial array have been merged, as follows: array([ 'Samsung', 'Microsoft', 'IBM',
'Spotify', 'Flipkart', '1938', '1975', '1911', '2006', '2007', '489000', '131000', '380000',
'3000', '30000'], dtype='<U9' ). A note reads: 2D array has been flattened to one
dimension. [Video description ends] We observe that the initial elements of our
flattened 1D array, is the first row of the original 2D array, followed by the second
row, and then succeeded by the third row.
Now, if you would like to perform a flattening of the array, but we would like to do
that in a column-wise manner, we can make use of the flatten function once again. But
this time, we specify the value of the ordered argument to be F. [Video description
begins] He adds two lines of code. The first line is: for data in
companies.flatten(order = 'F'):. The second line is: print(data). [Video description
ends]
F here stands for Fortran style ordering. That is the Fortran programming language.
[Video description begins] He highlights order = 'F' in the code. [Video description
ends] The previous row-wise order was C-style ordering, that also denotes the C
programming language. So when we perform our Fortran-style ordering, [Video
description begins] He runs the code and the output lists the following values:
Samsung 1938 489000 Microsoft 1975 131000 IBM 1911 380000 Spotify 2006 3000
Flipkart 2007 30000. [Video description ends] we see here that the original two-
dimensional array has been flattened, column-wise.
Before we perform some more iterations, we create one more array. So this is a two-
dimensional array with four rows and four columns. [Video description begins] He
adds two lines of code. The first line is: num = np.arange(16).reshape(4,4). The
second line is: num. [Video description ends] And once our array has been created,
[Video description begins] He runs the code and an array of four rows and four
columns display, comprising of values zero to 15, in numerical order. The first row is:
array([[ 0, 1, 2, 3],. The second row is: [ 4, 5, 6, 7],. The third row is: [ 8, 9, 10, 11],.
The fourth row is: [12, 13, 14, 15]]). [Video description ends] we will iterate through
this using the NumPy .nditer function.
You will notice from the ordering of these elements that nditer, by default, performs a
row-wise or C-style ordering. If you would like to perform a column-wise ordering,
then we simply call the nditer function by passing the array, and then specify the value
of the order argument to be F, for Fortran style. [Video description begins] He adds
two lines of code. The first line is: for i in np.nditer(num, order = 'F'):. The second
line is: print(i). [Video description ends] And as you can see from the output, this
does traverse though the array in a column-wise manner. [Video description
begins] He runs the code and the output lists values from zero to 15, in the following
order: 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15. [Video description ends]
Rather than iterating through the individual elements, or 0-dimensional arrays, we can
iterate through the rows or columns, by making use of the flags parameter for the
nditer function. So if we pass in a value of external_loop to the flags parameter, and
we specify the ordering to be Fortran style, [Video description begins] He adds two
lines of code. The first line is: for i in np.nditer(num, order = 'F', flags =
[ 'external_loop' ]):. The second line is: print(i). [Video description ends] then we will
be iterating through each of the columns of this array. [Video description begins] He
runs the code and values from zero to 15 displays in four lines of output. The first line
is: [ 0 4 8 12]. The second line is: [ 1 5 9 13]. The third line is: [ 2 6 10 14]. The
fourth line is: [ 3 7 11 15]. [Video description ends]
Note that the flags parameter accepts a list of flags. So, we can specify a number of
different flags in order to define the behavior of our nditer object. Also, had we
performed a C-style ordering, we would have been iterating through the rows of our
2D array.
One thing to keep in mind when using the nditer object, is that the array which is
returned by it is, by default, read-only. This is because it points to the array which is
passed as argument to ndinter. And any changes which will be performed will not be
on a copy of the array, but will, in fact, be to the array itself. In this for loop, we
calculate the square for each of the elements in our 2D array, by multiplying each
element with itself. [Video description begins] He adds two lines of code. The first
line is: for array in np.nditer(num):. The second line is: array[...] = array *
array. [Video description ends] However, when we execute this cell, [Video
description begins] He runs the code and the output is a ValueError message, noting
that the assignment destination is read-only. [Video description ends] we get an error,
and this is because the array is read-only for us.
However, if you would like to update an array, then we can call the nditer function, by
passing it the array, and by setting the op flag argument to be equal to readwrite.
[Video description begins] He adds two lines of code. The first line is: for array in
np.nditer(num, op_flags = [ 'readwrite' ]):. The second line is: array[...] = array *
array. [Video description ends] Note here that op flag is also a list, but we are only
setting one of the op flags here.
So when we execute this for loop, we get no errors this time, and we can confirm what
a num array looks like. And we see here that the square of each of the elements is now
available. [Video description begins] He runs the code: num. The output displays an
array over four lines. The first line is: array([[ 0, 1, 4, 9],. The second line is: [ 16,
25, 36, 49],. The third line is: [ 64, 81, 100, 121],. The fourth line is: [144, 169, 196,
225]]). [Video description ends] So we were successfully able to write to our array
using the nditer object.
Reshaping Arrays
[Video description begins] Topic title: Reshaping Arrays. Your host for this session is
Kishan Iyer. [Video description ends]
We have already come across a few operations which reshape an array, such as the
reshape function, as well as the flatten function. We will now explore some of the
other operations which are available in order to change the shape of an array.
So once this array has been created, [Video description begins] He runs the code and
an array of two rows and five columns displays as the output. The first row is:
array([['IBM', 'Apple Inc.', 'Intel', 'Dell', 'Microsoft'],. The second row is: [ 'New
York', 'California', 'California', 'Texas', 'Washington' ]], dtype='<U10' ). [Video
description ends] we can examine the shape of this array using the shape property of
the array object. So our array contains 2 rows and 5 columns, which is denoted in this
tuple. [Video description begins] He runs the code: tech_companies.shape. The
output is: (2, 5). [Video description ends] And the first operation which we will
perform on this array is to use the ravel function.
So the ravel function is similar to the flatten function [Video description begins] He
adds a line of code: tech_companies.ravel(). [Video description ends] which reduces a
multidimensional array to a single dimensional array. Except the difference between
ravel and flatten is that flatten belongs to an array object and can only be used by
NumPy arrays. [Video description begins] He runs the code and the output displays
an array in which all the values of the initial array have been merged, as follows:
array(['IBM', 'Apple Inc.', 'Intel', 'Dell', 'Microsoft', 'New York', 'California',
'California', 'Texas', 'Washington' ], dtype='<U10' ). A note reads: ravel() is similar
to flatten(). [Video description ends] On the other hand, a ravel function belongs to
the NumPy library and can be used by any object which can be passed.
The next thing we do with our array is to get it transposed. That is a form of the array
where the rows are the columns of the original, and the columns are the rows of the
original. So the transpose for an array can, in fact, be obtained from the dot uppercase
T property. [Video description begins] He adds a line of code:
tech_companies.T. [Video description ends] So we do not need to call a function, but
we can simply access this T property. And we see here that the transpose of the array
is available through it. [Video description begins] He runs the code. The output
displays an array over five lines. The first line is: array([[ 'IBM', 'New York'],. The
second line is: [ 'Apple Inc.', 'California' ],. The third line is: [ 'Intel', 'California' ],.
The fourth line is: [ 'Dell', 'Texas' ],. The fifth line is: [ 'Microsoft', 'Washington' ]],
dtype='<U10' ). [Video description ends]
And we can also perform a raveling or flattening of this transpose. And we do that by
calling the ravel function [Video description begins] He adds a line of code:
tech_companies.T.ravel(). [Video description ends] because the transpose of an array
is also an array object. [Video description begins] He runs the code and the output
displays an array in which the values of the previous array have been merged, as
follows: array([ 'IBM', 'New York', 'Apple Inc.', 'California', 'Intel', 'California',
'Dell', 'Texas', 'Microsoft', 'Washington' ], dtype='<U10' ). [Video description ends]
So we see from the output that we have obtained the flattened form of the array
transpose.
So let us just revise the shape of our original array. So it contains 2 rows and 5
columns. [Video description begins] He runs the code: tech_companies.shape. The
output is: (2, 5). [Video description ends] And now, we can simply perform a reshape
function on it, in order to reshape it to an array containing 5 rows and 2 columns.
[Video description begins] He adds a line of code:
tech_companies.reshape(5,2). [Video description ends]
So from the output, we gather that the elements have been processed in a C-order.
That is they have been processed row-wise. [Video description begins] He runs the
code and the output displays an array of five rows and two columns. The first row is:
array([['IBM', 'Apple Inc.' ],. The second row is: [ 'Intel', 'Dell' ],. The third row is:
[ 'Microsoft', 'New York' ],. The fourth row is: [ 'California', 'California' ],. The fifth
row is: [ 'Texas', 'Washington' ]], dtype='<U10' ). [Video description ends]
So here, we reshape an array of 18 elements to one containing 3 rows and 6 columns.
[Video description begins] He adds a line of code:
np.arange(18).reshape(3,6). [Video description ends] And this operation is successful
because the product of the arguments passed to reshape is equal to the number of
elements in the array which is being reshaped. [Video description begins] He runs the
code and the output displays an array of three rows and six columns. The array
contains values from zero to 17, in series. For example, the first row is: array([[ 0, 1,
2, 3, 4, 5],. [Video description ends]
We now define one more array containing the names of eight different companies. So
this is a one-dimensional array containing eight elements. [Video description
begins] He adds two lines of code. The first line is: companies = np.array([ 'IBM',
'Apple', 'Intel', 'Sony', 'Microsoft', 'HP', 'Hitachi', 'Panasonic' ]). The second line is:
companies. [Video description ends]
So once this companies array have been created, [Video description begins] He runs
the code and the output is: array([ 'IBM', 'Apple', 'Intel', 'Sony', 'Microsoft', 'HP',
'Hitachi', 'Panasonic' ], dtype='<U9' ). [Video description ends] we are going to
perform a reshape operation on it. But this time we're going to specify a negative
value for one of the dimension lengths. So when you specify a negative value here, it
means that particular dimension will be inferred by the reshape function.
So here, we set the number of rows to be minus 1. Which means that the number of
rows of the reshape array will be inferred using the number of elements available.
And the number of columns which have been specified. [Video description
begins] He adds a line of code: companies.reshape(-1, 4). [Video description ends] So
since the number of elements is 8 and the number of columns is 4, the reshape
function has inferred that the number of rows in this array should be 2. [Video
description begins] He runs the code and an array of two rows and four columns
displays as the output. The first row is: array([[ 'IBM', 'Apple', 'Intel', 'Sony' ],. The
second row is: [ 'Microsoft', 'HP', 'Hitachi', 'Panasonic' ]], dtype='<U9' ). A note
reads: Array shape is (2,4). [Video description ends]
Similarly, we can perform the reshape operation by specifying a negative value for the
number of columns and set the number of rows as 4. [Video description begins] He
adds a line of code: companies.reshape(4, -1). [Video description ends] And the
reshape function is able to infer that the number of columns in this array needs to be 2.
[Video description begins] He runs the code and the output displays an array of four
rows and two columns. The first row is: array([[ 'IBM', 'Apple' ],. The second row is:
[ 'Intel', 'Sony' ],. The third row is: [ 'Microsoft', 'HP' ],. The fourth row is:
[ 'Hitachi', 'Panasonic' ]], dtype='<U9' ). A note reads: Second dimension inferred to
give a shape of (4, 2). [Video description ends]
However, as always when using the reshape function, one needs to be careful that the
number of rows and columns to which the array needs to be reshaped are, in fact,
compatible with the dimensions of the original array. So here, if we try to reshape an
array of 8 elements into 5 rows and then the number of columns should be inferred,
[Video description begins] He adds a line of code: companies.reshape(5, -1). [Video
description ends] then this operation results in an error. [Video description begins] He
runs the code and the output displays an error message, which includes the line:
ValueError: cannot reshape array of size 8 into shape (5,newaxis). [Video description
ends]
In this exercise, you will create an array, x, which has the same shape as another
array, y, and all elements of the array x should be set to 1. This can be accomplished
writing one line of code using NumPy.
You will then write one more line of code, and this will be in order to retrieve a part
of an array. So this code should be able to return the last 20 rows, and columns 2
through 5, both included, from a two-dimensional array.
You will then recall the two primary ways in which a two-dimensional array can be
flattened using NumPy. [Video description begins] So, the third task is to recall the
two main ways in which a 2D array can be flattened to a 1D array. [Video description
ends]
And finally, you will imagine that you have made a call to a function which has
returned a NumPy array. You're not exactly sure what the dimensions of this array are,
but you need to reshape it into one which contains exactly four columns. And you
leave it up to NumPy in order to infer how many rows this reshaped array should
contain.
Tasks 1, 2, and 4 can be performed using exactly one line of code. Do try and perform
these tasks all on your own.
The first task in the exercise was to create one array, x, in the shape of another array,
y, where all the elements of x are initialized to 1. The way to do that in NumPy is to
make use of the ones_like function. And then pass to it, as an argument, the array
whose shape you would like to mimic. [Video description begins] The following code
is used to create an array x of the same shape as another array y, with all elements
set to one: x= numpy.ones_like(y). [Video description ends] Of course, there are other
more indirect ways in which you could have performed the same task. But the use of
the ones_like function is the most straightforward and the cleanest way to do it.
The next task involved getting a slice of a two-dimensional array, and we do this by
specifying a range of rows and columns. [Video description begins] The following
code is used to retrieve the last 20 rows and the second to fifth columns of a 2D
array: array_slice = array_orig[-20: , 2:6]. [Video description ends] The range of
rows begins from the 20th to last row and goes on until the end, which is why we
specify a start index of minus 20, and leave the ending index for the rows as blank.
And then, in order to retrieve columns 2 through 5, we specify a range of 2 through 6.
[Video description begins] He points to the following section in the code: [-20: ,
2:6]. [Video description ends]
The next task was to list the two main ways in which we can flatten a two-
dimensional array. These include the row-major, or C-style ordering, or the column-
major, or F-style ordering. F-style here references Fortran-style ordering.
The last task in this exercise involved reshaping an array where one of the dimensions
is not known in advance. In order to get NumPy to infer the dimension, we can call
the reshape function and then pass in a value of minus 1, or any negative number, for
the dimension which needs to be inferred. Since we know that our reshaped array will
contain 4 columns, we pass that dimension along to reshape, and then we set the row
value to minus 1, so that NumPy can deduce what that value should be. [Video
description begins] So, to reshape an array returned by a function to one containing
four columns, and let NumPy infer the number of rows, the following code should be
used: new_array = returned_array.reshape (-1, 4). [Video description ends]
You will know by now, of course, that this reshape operation will only work if the
number of elements in the returned array is divisible by the number of columns in the
reshaped array, that is, if that is divisible by 4. This dimension inference feature in
NumPy is very helpful when you do not know the dimensions of an array which you
are getting. However, if you know that the array which you will receive will be of a
certain format, or will meet certain conditions, then this dimension inference can be
used.