3 Introduction To Numpy
3 Introduction To Numpy
ipynb - Colab
In this series of articles, we will cover the basics of Data Analysis using Python. The lessons will start growing gradually until
forming a concrete analytical mindset for students. This lesson will cover the essentials of Scientific Computing in Python using
NumPy
What is NumPy?
NumPy is short for Numerical Python and, as the name indicates, it deals with everything related to Scientific Computing. The basic object in
NumPy is the ndarray which is also a short for n-dimentional array and in a mathematical context it means multi-dimentional array.
Any mathematical operation such as differention, optimization, solving equations simultionously will need to be defined in a matrix format to be
done properly and easily and that was the pupose of programming languages like Matlab.
Unlike any other python object, ndarray has some intersting aspects that ease any mathematical computation.
NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a
new array and delete the original.
The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are
executed more efficiently and with less code than is possible using Python’s built-in sequences.
For the sake of Data Analytics there will not be a lot of mathematical compution proplems but later on when we will start working with data in
tables. You will figure out that any table is more or less a 2d dimentional array and that's why it's essiential to know a bit about array that will
convert in future lessons to tables of data.
What is an array?
Array is a mathematical object that is defined to hold some numbers organized in rows and columns. The structure of the array should allow
selecting (indexing) any of the inner items. Later on we will see how to do this in code.
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 1/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[1 2 3]
[4 5 6]]
<class 'numpy.ndarray'>
[[1 2 3]]
<class 'numpy.ndarray'>
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 2/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
print(arr2d.shape)
print(arr1d.shape)
(2, 3)
(1, 3)
[[0. 0.]
[0. 0.]]
[[1. 1.]
[1. 1.]
[1. 1.]
[1. 1.]
[1. 1.]]
[[-9 -9 -9 -9]
[-9 -9 -9 -9]
[-9 -9 -9 -9]
[-9 -9 -9 -9]
[-9 -9 -9 -9]]
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 3/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
[[0.5488135 0.71518937]
[0.60276338 0.54488318]]
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
This is what called slicing. It can be done using the following syntacx.
arr[start_index_of_rows:end_index_for_rows, start_index_for_columns:end_index_for_columns]
Python is generally a zero-indexed languge so, your first column will be column zero and the same applies for rows.
The end boundary for the above syntax is exclusive so the slicing stops directly before that boundary.
[[5 6 7 8]]
Now, lets try selecting the second column with the same manner.
[[ 2]
[ 6]
[10]]
Now, lets select the slice that come from the first two rows and two columns
[[1 2]
[5 6]]
We should select this based on the mathematical indexing and, for sure, with applying the zero-indexing.
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 4/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[1 2]
[3 4]
[5 6]]
0*0
1*1
2*0
[1 4 5]
# Define an array
otherNewArray = np.array([[1,2], [3, 4], [5, 6]])
# Lets print it
print(otherNewArray)
# Consturuct a boolean index (To check for elements greater than 2)
bool_idx = (otherNewArray > 2)
# Print the result of the boolean index
print(bool_idx)
# Now we will use such index to print all elements greater than 2
print(otherNewArray[bool_idx])
[[1 2]
[3 4]
[5 6]]
[[False False]
[ True True]
[ True True]]
[3 4 5 6]
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 5/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
Below is a list of all data types in NumPy and the characters used to represent them.
i - integer
b - boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - unicode string
V - fixed chunk of memory for other type ( void )
In general, we will not use all of them. Only the famous ones are heavily used such as iteger , float , string .
x = np.array([1, 2])
print(x.dtype)
int64
Here, the datatype of the inner elements, which must be unified, is int64
y = np.array([1.0, 2.0])
print(y.dtype)
float64
While creating a NumPy array we can force a specific data type. Lets see the following example.
|S3
Here, the elements of the array z are str . Lets define a float array.
float32
Now, we will define two arrays on which the whole mathematical operations will be applied.
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 6/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[ 6. 8.]
[10. 12.]]
==========
[[ 6. 8.]
[10. 12.]]
==========
[[-4. -4.]
[-4. -4.]]
==========
[[-4. -4.]
[-4. -4.]]
==========
[[ 5. 12.]
[21. 32.]]
==========
[[ 5. 12.]
[21. 32.]]
==========
[[0.2 0.33333333]
[0.42857143 0.5 ]]
==========
[[0.2 0.33333333]
[0.42857143 0.5 ]]
==========
[[1. 1.41421356]
[1.73205081 2. ]]
print(x)
print('='*10)
print(x.T)
[[1. 2.]
[3. 4.]]
==========
[[1. 3.]
[2. 4.]]
array([[19., 22.],
[43., 50.]])
keyboard_arrow_down Broadcasting
The term "broadcasting" describes how Numpy handles arrays of differing dimensions when performing operations that result in restrictions;
the smaller array is broadcast across the bigger array to ensure that they have compatible dimensions
As we know that Numpy is built in C, broadcasting offers a way to vectorize array operations so that looping happens in C rather than Python.
This results in effective algorithm implementations without the requirement for extra data duplication.
In the follwing example, we need to add the elements of y to each row of array x . We will do this using two methods:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
==========
[1 0 1]
Lets create an empty array y with the same shape of x that will hold the result of the addition process.
%%time
# This command will calculate the excution time for the whole cell
# Create an empty matrix with the same shape as x
y = np.empty_like(x)
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y[i, :] = x[i, :] + v
print(y)
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
CPU times: user 275 µs, sys: 37 µs, total: 312 µs
Wall time: 308 µs
%%time
z = x + v
print(z)
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 8/9
7/18/24, 11:44 AM 3-introduction-to-numpy.ipynb - Colab
[[ 2 2 4]
[ 5 5 7]
[ 8 8 10]
[11 11 13]]
CPU times: user 692 µs, sys: 0 ns, total: 692 µs
Wall time: 672 µs
This notebook is part of my Python for Data Analysis course. If you find it useful, you can upvote it! Also, you can follow me on LinkedIn and
Twitter.
1. Introduction to Python
2. Iterative Operations & Functions in Python
https://colab.research.google.com/drive/1Eu1iJwqopohMA9DYh1T5AhPVwsSwgF3J#printMode=true 9/9