0% found this document useful (0 votes)
4 views

Getting started with NumPy in Data Analytics

This document provides an overview of using Python for data science, specifically focusing on the NumPy library, which is essential for working with arrays in Python. It covers topics such as NumPy's data types, array creation, indexing, slicing, reshaping, and performance advantages over traditional Python lists. The document also includes examples and explanations of how to manipulate and access data within NumPy arrays.

Uploaded by

yaraha5692
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Getting started with NumPy in Data Analytics

This document provides an overview of using Python for data science, specifically focusing on the NumPy library, which is essential for working with arrays in Python. It covers topics such as NumPy's data types, array creation, indexing, slicing, reshaping, and performance advantages over traditional Python lists. The document also includes examples and explanations of how to manipulate and access data within NumPy arrays.

Uploaded by

yaraha5692
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Course code : CSE1006

Course title : Foundations of Data Analytics

Module-5
Using Python for Data Science

17-04-2025 Dr. V. Srilakshmi 1


Using Python for Data Science
• Overview of Python, Introduction to NumPy, NumPy standard data
types, the basics of NumPy Arrays: NumPy Array Attributes, Array
Indexing: Accessing Single Elements, Array Slicing: Accessing
Subarrays, Reshaping of Arrays, Array Concatenation and Splitting,
Aggregations,Computations on Arrays, NumPy’s Structured arrays

17-04-2025 Dr. V. Srilakshmi 2


Introduction to NumPy
• NumPy stands for Numerical Python.
• NumPy is a Python library used for working with arrays.
• It also has functions for working in domain of linear algebra, Fourier transform,
and matrices.
• NumPy was created in 2005 by Travis Oliphant. It is an open-source project and
you can use it freely.
• NumPy is a Python library and is written partially in Python, but most of the
parts that require fast computation are written in C or C++.
Uses of NumPy:
• In Python we have lists that serve the purpose of arrays, but they are slow to
process.
• NumPy aims to provide an array object that is up to 50x faster than traditional
Python lists.
• The array object in NumPy is called ndarray, it provides a lot of supporting
functions that make working with ndarray very easy.
• Arrays are very frequently used in data science, where speed and resources are
very important.
17-04-2025 Dr. V. Srilakshmi 3
Introduction to NumPy
Why is NumPy Faster Than Lists?
• NumPy arrays are stored at one continuous place in memory unlike lists,
so processes can access and manipulate them very efficiently.
• This behaviour is called locality of reference in computer science.
• This is the main reason why NumPy is faster than lists. Also, it is optimized
to work with latest CPU architectures.
Getting started with NumPy:
• If you have Python and PIP already installed on a system, then installation
of NumPy is very easy.
• Install it using this command: C:\Users\Your Name>pip install numpy
• If this command fails, then use a python distribution that already has
NumPy installed like, Anaconda, Spyder etc.
17-04-2025 Dr. V. Srilakshmi 4
NumPy standard Datatypes
Data Types in Python
• By default, Python have these data types:
• strings - used to represent text data, the text is given under quote marks. e.g. "ABCD"
• integer - used to represent integer numbers. e.g. -1, -2, -3
• float - used to represent real numbers. e.g. 1.2, 42.42
• boolean - used to represent True or False.
• complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j
Data Types in NumPy
• NumPy has some extra data types and refer to data types with one character, like i for
integers, u for unsigned integers etc.
• Below is a list of all data types in NumPy and the characters used to represent them.
i - integer M - datetime
b - boolean O - object
u - unsigned integer S - string
f - float U - unicode string
c - complex float V - fixed chunk of memory for other type ( void )
m - timedelta
17-04-2025 Dr. V. Srilakshmi 5
NumPy Arrays
• Python lists are a substitute for arrays, but they fail to deliver the performance
required while computing large sets of numerical data.
• To address this issue we use the NumPy library of Python. NumPy offers an array
object called ndarray. They are similar to standard Python sequences but differ in
certain key factors.

What is a NumPy Array?


• NumPy array is a multi-dimensional data structure that is the core of scientific
computing in Python.
• All values in an array are homogenous (of the same data type).
• They provide efficient memory management, support various data types and are
flexible with Indexing and slicing.
17-04-2025 Dr. V. Srilakshmi 6
NumPy Arrays
Dimensions in Arrays
• NumPy arrays can have multiple dimensions, allowing users to store data in
multilayered structures..
• 0D (zero-dimensional) Scalar – A single element
• 1D (one-dimensional) Vector- A list of integers.
• 2D (two-dimensional) Matrix- A spreadsheet of data
• 3D (three-dimensional) Tensor- Storing a color image

Create a NumPy ndarray Object


• NumPy is used to work with arrays. The array object in NumPy is called ndarray.

• We can create a NumPy ndarray object by using the array() function.

17-04-2025 Dr. V. Srilakshmi 7


NumPy Arrays
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))
• type(): This built-in Python function tells us the type of the object passed to it. Like in
above code it shows that arr is numpy.ndarray type.
• To create an ndarray, we can pass a list, tuple or any array-like object into the array()
method, and it will be converted into an ndarray.
arr = np.array((1, 2, 3, 4, 5))
print(arr)
17-04-2025 Dr. V. Srilakshmi 8
NumPy Arrays
Check Number of Dimensions:
• NumPy Arrays provides the ndim attribute that returns an integer that tells us
how many dimensions the array have
Example:
import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim) #0
print(b.ndim) #1
print(c.ndim) #2
print(d.ndim) #3
17-04-2025 Dr. V. Srilakshmi 9
NumPy Array Indexing
Accessing Array Elements:
• Array indexing is as same as accessing an array element.
• You can access an array element by referring to its index number.
• The indexes in NumPy arrays start with 0, meaning that the first element has
index 0, and the second has index 1 etc.
• Example:
import numpy as np

arr = np.array([1, 2, 3, 4])

for i in range(0,4):

print(arr[i]) #1234
17-04-2025 Dr. V. Srilakshmi 10
NumPy Array Indexing
Access 2-D Arrays:
• To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.
• Example:
import numpy as np

arr1 = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr1[0, 1]) # 2nd element on 1st row: 2

17-04-2025 Dr. V. Srilakshmi 11


NumPy Array Indexing
Access 3-D Arrays:
• To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.
• Example:
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2]) #6
Negative Indexing: Use negative indexing to access an array from the end.
• Example:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('Last element from 2nd dim: ', arr[1, -1]) #10


17-04-2025 Dr. V. Srilakshmi 12
NumPy Array Slicing
Slicing arrays:
• Slicing in python means taking elements from one given index to another given index.
• We pass slice instead of index like this: [start:end].
• We can also define the step, like this: [start:end:step].
• If we don't pass start its considered 0
• If we don't pass end its considered length of array in that dimension
• If we don't pass step its considered 1
• Example:
import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5]) #2345
print(arr[4:]) #567

17-04-2025 Dr. V. Srilakshmi 13


NumPy Array Slicing
Slicing 1D arrays:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
• Example:
Slice elements from the beginning to index 4 (not included):
print(arr[:4]) #1234
Use the minus operator to refer to an index from the end:
print(arr[-3:-1]) #56
Use the step value to determine the step of the slicing:
print(arr[1:6:2]) #246
Return every other element from the entire array:
print(arr[::2]) #1357

17-04-2025 Dr. V. Srilakshmi 14


NumPy Array Slicing
Slicing 2D arrays:
import numpy as np
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
Example:
From the second element, slice elements from index 1 to 4 (not included)
print(arr[1,1:4]) #789
From both elements, return index 2:
print(arr[0:2,2]) #38
From both elements, slice index 1 to index 4, this will return a 2-D array:
print(arr[0:2, 1:4]) # [[2 3 4] [7 8 9]]
Return every other element from the entire array:
print(arr[::2]) # 1 3 5 7 10

17-04-2025 Dr. V. Srilakshmi 15


NumPy Array Shape

• The shape of an array is the number of elements in each dimension.


• NumPy arrays have an attribute called shape that returns a tuple with each
index having the number of corresponding elements.
Example:
#Print the shape of a 2-D array
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape) # (2,4)
• The example above returns (2, 4), which means that the array has 2
dimensions, where the first dimension has 2 elements and the second
has 4.

17-04-2025 Dr. V. Srilakshmi 16


NumPy Array Shape
• The shape of an array is the number of elements in each dimension.
• NumPy arrays have an attribute called shape that returns a tuple with
each index having the number of corresponding elements.
Example:
#Print the shape of a 2-D array
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape) # (2,4)
• The example above returns (2, 4), which means that the array has 2 dimensions,
where the first dimension has 2 elements and the second has 4.

arr = np.array([1, 2, 3, 4], ndmin=5)


print(arr) # [[[[[1 2 3 4]]]]]
print('shape of array :', arr.shape) # (1, 1, 1, 1, 4)
• In the above example, Creating an array with 5 dimensions using ndmin using a vector
with values 1,2,3,4 and observe that last dimension has value 4
17-04-2025 Dr. V. Srilakshmi 17
NumPy Array Reshaping
• Reshaping means changing the shape of an array.
• The shape of an array is the number of elements in each dimension.
• By reshaping we can add or remove dimensions or change number of
elements in each dimension.
Example: Reshape From 1-D to 2-D
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
Output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
17-04-2025 Dr. V. Srilakshmi 18
NumPy Array Reshaping
Example: Reshape From 1-D to 3-D
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)
Output:
[[[ 1 2]
[ 3 4]
[ 5 6]]

[[ 7 8]
[ 9 10]
[11 12]]]
Note : We can reshape an 8 elements 1D array into 4 elements in 2 rows 2D array but we
cannot reshape it into a 3 elements 3 rows 2D array as that would require 3x3 = 9
elements.
17-04-2025 Dr. V. Srilakshmi 19
NumPy Array Reshaping
• You are allowed to have one "unknown" dimension.
• Meaning that you do not have to specify an exact number for one of the
dimensions in the reshape method.
• Pass -1 as the value, and NumPy will calculate this number for you.
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarr = arr.reshape(2, 2, -1) #observe that more than one -1 can’t be given
print(newarr)
Output:
[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

17-04-2025 Dr. V. Srilakshmi 20


NumPy Array Reshaping
Flattening the arrays
• Flattening array means converting a multidimensional array into a 1D
array.
• We can use reshape(-1) to do this.
Example:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
newarr = arr.reshape(-1)
print(newarr)
Output:
[1 2 3 4 5 6]

17-04-2025 Dr. V. Srilakshmi 21


Iterating Arrays
• Iterating means going through elements one by one.
• As we deal with multi-dimensional arrays in NumPy, we can do this using
basic for loop of python.
• If we iterate on a 1-D array it will go through each element one by one.
Example:
import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)
Output:
1
2
3

17-04-2025 Dr. V. Srilakshmi 22


Iterating Arrays
Example: Iterate 2-D array
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
print(x)
Output:
[123]
[456]

Example: Iterate 2-D array element wise


import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
for y in x:
print(y)
17-04-2025 Dr. V. Srilakshmi 23
Iterating Arrays
Example: Iterate 3-D array
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)
Output:
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
Example: Iterate 3-D array element wise
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
for y in x:
for z in y:
print(z)
17-04-2025 Dr. V. Srilakshmi 24
17-04-2025 Dr. V. Srilakshmi 25
17-04-2025 Dr. V. Srilakshmi 26
17-04-2025 Dr. V. Srilakshmi 27
17-04-2025 Dr. V. Srilakshmi 28
Array Concatenation
• Joining means putting contents of two or more arrays in a single array.
• In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.
• We pass a sequence of arrays that we want to join to the concatenate()
function, along with the axis. If axis is not explicitly passed, it is taken as 0.
Example: Join two 1D arrays
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
Output:
[1 2 3 4 5 6]

17-04-2025 Dr. V. Srilakshmi 29


Array Concatenation
• Joining means putting contents of two or more arrays in a single array.
• In SQL we join tables based on a key, whereas in NumPy we join arrays by
axes.
• We pass a sequence of arrays that we want to join to the concatenate()
function, along with the axis. If axis is not explicitly passed, it is taken as 0.
Example: Join two 2D arrays (axis=1):
import numpy as np
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
print(arr)
Output:
[[1 2 5 6]
[3 4 7 8]]
17-04-2025 Dr. V. Srilakshmi 30
Array Concatenation
Joining Arrays Using Stack Functions:
• Stacking is same as concatenation, the only difference is that stacking is done
along a new axis.
• We can concatenate two 1-D arrays along the second axis which would result in
putting them one over the other, ie. stacking.
• We pass a sequence of arrays that we want to join to the stack() method along
with the axis. If axis is not explicitly passed it is taken as 0.
Example: Join two 1D arrays
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2), axis=1)
print(arr)
Output:
[[1 4]
[2 5]
[3 6]]
17-04-2025 Dr. V. Srilakshmi 31
Array Splitting
• Splitting is reverse operation of Joining.
• Joining merges multiple arrays into one and Splitting breaks one array into
multiple.
• We use array_split() for splitting arrays, we pass it the array we want to
split and the number of splits.
Example: Split the array in 3 parts
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
Output:
[array([1, 2]), array([3, 4]), array([5, 6])]
17-04-2025 Dr. V. Srilakshmi 32
Array Splitting
• If the array has less elements than required, it will adjust from the end
accordingly.
Example: Split the array in 4 parts
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 4)
print(newarr)
Output:
[array([1, 2]), array([3, 4]), array([5]), array([6])]

17-04-2025 Dr. V. Srilakshmi 33


Array Splitting
• Use the same syntax when splitting 2-D arrays.
• Use the array_split() method, pass in the array you want to split and the number of
splits you want to do.
Example: Split the 2-D array into three 2-D arrays.
import numpy as np
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
print(newarr)
Output:
[array([[1, 2],
[3, 4]]), array([[5, 6],
[7, 8]]), array([[ 9, 10],
[11, 12]])]

17-04-2025 Dr. V. Srilakshmi 34


Array Aggregations
• NumPy is a powerful library in Python for numerical and mathematical operations, and
it provides various aggregation functions to perform operations on arrays.
• Aggregation functions in NumPy allow you to perform computations across the entire
array or along a specified axis. Here are some commonly used NumPy aggregation
functions.
• numpy.sum():
• This function returns the sum of array elements over the specified axis.
• Syntax : numpy.sum(arr, axis, dtype, out)
• arr : input array.
• axis : axis along which we want to calculate the sum value. Otherwise, it will
consider arr to be flattened(works on all the axis). axis = 0 means along the column
and axis = 1 means working along the row.
• out : Different array in which we want to place the result. The array must have
same dimensions as expected output. Default is None.
• dtype : [data-type, optional]Type we desire while computing sum.
17-04-2025 Dr. V. Srilakshmi 35
Array Aggregations
• Example:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Sum of all elements in the array
total_sum = np.sum(arr)
print(total_sum) #21
# Sum along a specific axis (axis=0 for columns, axis=1 for rows)
column_sum = np.sum(arr, axis=0)
row_sum = np.sum(arr, axis=1)
print(column_sum) #[ 5 7 9 ]
print(row_sum) #[ 6 15 ]
17-04-2025 Dr. V. Srilakshmi 36
Array Aggregations
• numpy. mean():
• Compute the arithmetic mean (average) of the given data (array elements) along the
specified axis
• Syntax : numpy.mean(arr, axis, out)
• arr : input array.
• axis : axis along which we want to calculate the sum value. Otherwise, it will consider arr to be
flattened(works on all the axis). axis = 0 means along the column and axis = 1 means working along
the row.
• out : Different array in which we want to place the result. The array must have same dimensions as
expected output. Default is None.
• Example:
# Python Program illustrating numpy.mean() method
import numpy as np
arr = [20, 2, 7, 1, 34]
print("arr : ", arr) #arr : [20, 2, 7, 1, 34]
print("mean of arr : ", np.mean(arr)) #mean of arr : 12.8
17-04-2025 Dr. V. Srilakshmi 37
Array Aggregations
• numpy. median():
• Compute the median of the given data (array elements) along the specified axis.
• Syntax : numpy.median(arr, axis, out)
• arr : input array.
• axis : axis along which we want to calculate the sum value. Otherwise, it will consider arr to be
flattened(works on all the axis). axis = 0 means along the column and axis = 1 means working along
the row.
• out : Different array in which we want to place the result. The array must have same dimensions as
expected output. Default is None.
• Example:
# Python Program illustrating numpy.mean() method
import numpy as np
arr = [20, 2, 7, 1, 34]
print("arr : ", arr) #arr : [20, 2, 7, 1, 34]
print("median of arr : ", np.median(arr)) #mean of arr : 7.0
17-04-2025 Dr. V. Srilakshmi 38
Array Aggregations
• numpy.min and numpy.max: Compute the minimum and maximum
values of an array
• numpy.ceil() is a mathematical function that returns the ceil of the
elements of array.
• numpy.floor() is a mathematical function that returns the floor of the
elements of array.
• numpy.fix() is a mathematical function that rounds elements of the array
to the nearest integer towards zero.
• numpy.exp() : This mathematical function helps user to calculate
exponential of all the elements in the input array.
• np.sort() : This is a function that performs sorting of array elements.

17-04-2025 Dr. V. Srilakshmi 39


Computations on Numpy Arrays
• The below numpy functions are used to perform arithmetic operations on
array in NumPy
• np.add(): Compute the addition of 2 arrays
• np.subtract() : Compute the subtraction of 2 arrays
• np.multiply() : Compute the multiplication of 2 arrays
• np.divide() : Compute the multiplication of 2 arrays
• numpy.power() : This function treats elements in the first input array as
the base and returns it raised to the power of the corresponding element
in the second input array.
• numpy.mod() This function returns the remainder of division of the
corresponding elements in the input array. The function
numpy.remainder() also produces the same result.
17-04-2025 Dr. V. Srilakshmi 40
Computations on Numpy Arrays
• Example: Arithmetic Operations:
# Python code to perform arithmetic operations on NumPy array
import numpy as np
arr1 = np.array([[0.0, 1.0], [3.0, 4.0]])
print('First array:’)
print(arr1)
print('\nSecond array:’)
arr2 = np.array([12, 12])
print(arr2)
print('\nAdding the two arrays:’)
print(np.add(arr1, arr2))
print('\nSubtracting the two arrays:’)
print(np.subtract(arr1, arr2))
print('\nMultiplying the two arrays:’)
print(np.multiply(arr1, arr2))
print('\nDividing the two arrays:’)
print(np.divide(arr1, arr2))

17-04-2025 Dr. V. Srilakshmi 41


Computations on Numpy Arrays
• Example: Power Operations
import numpy as np

arr = np.array([5, 10, 15])

print('First array is:’)


print(arr)

print('\nApplying power function:’)


print(np.power(arr, 2))

print('\nSecond array is:’)


arr1 = np.array([1, 2, 3])
print(arr1)

print('\nApplying power function again:’)


print(np.power(arr, arr1))

17-04-2025 Dr. V. Srilakshmi 42


Computations on Numpy Arrays
• Example: mod/reminder Operations
import numpy as np

arr = np.array([5, 15, 20])


arr1 = np.array([2, 5, 9])

print('First array:’)
print(arr)

print('\nSecond array:’)
print(arr1)

print('\nApplying mod() function:’)


print(np.mod(arr, arr1))

print('\nApplying remainder() function:’)


print(np.remainder(arr, arr1))

17-04-2025 Dr. V. Srilakshmi 43


NumPy’s Structured Array
• Numpy’s Structured Array is similar to the Struct in C. It is used for
grouping data of different data types and sizes.
• Structured array uses data containers called fields. Each data field can
contain data of any data type and size.
• Array elements can be accessed with the help of dot notation.
• To create a structured array in NumPy, we need to define a dtype (data
type) that specifies the names and types of each field.
• Example :
import numpy as np dt = np.dtype([('name', 'U20'), ('age’, np.int32),
('grade', np.float64)])
• In this example, we defined a dtype with three fields: 'name' as a Unicode
string of length 20 characters, 'age' as a 32-bit integer, and 'grade' as a 64-
bit floating-point number.
17-04-2025 Dr. V. Srilakshmi 44
NumPy’s Structured Array
• Now, we can create a structured array using this dtype −
• data = np.array([('Alice', 25, 4.8), ('Bob', 23, 3.9), ('Charlie', 27, 4.5)],
dtype=dt)
• Example:
import numpy as np
dt = np.dtype([('name', 'U20'), ('age’, np.int32), ('grade', np.float64)])
a = np.array([('Sana', 2, 21.0), ('Mansi', 7, 29.0)],dtype=dt)
# Sorting according to the name
b = np.sort(a, order='name’)
print('Sorting according to the name', b)
# Sorting according to the age
b = np.sort(a, order='age’)
print('\nSorting according toDr.the
17-04-2025
age', b)
V. Srilakshmi 45

You might also like