Python 5th Sem
Python 5th Sem
1. What is NumPy?
• Efficiency: Arrays in NumPy are much more efficient in terms of memory usage and
computational speed compared to Python lists.
• Mathematical Operations: NumPy provides a wide range of optimized mathematical
operations for arrays like addition, subtraction, matrix multiplication, and more.
• Support for Multi-dimensional Arrays: NumPy supports arrays of arbitrary dimensions
(1D, 2D, 3D, and beyond), making it suitable for scientific computing.
• Broadcasting and Vectorization: These features allow efficient element-wise operations
without the need for loops.
4. Applications of NumPy:
• Scientific Computing: Used for data analysis, scientific research, simulations, and
numerical computations.
• Machine Learning and AI: NumPy is heavily used in libraries like TensorFlow and
PyTorch, which are essential for deep learning and AI applications.
• Data Analysis: Libraries like Pandas, which are built on top of NumPy, use it to
manipulate and process data.
• Image and Signal Processing: NumPy is also employed in image processing tasks
where pixel data can be represented as arrays.
1. Installing NumPy
The most common way to install NumPy is using pip, the Python package installer, which
downloads the latest version of NumPy from the Python Package Index (PyPI).
Use the pip command: If you have Python installed and pip is configured properly, you can
install NumPy with this command:
If you are using Python 3, it might be pip3 instead of pip depending on your setup:
Verifying the Installation: To check if NumPy has been installed successfully, open Python in
the command line (just type python or python3 in the terminal) and run the following code:
import numpy
print(numpy.__version__)
Once NumPy is installed, you need to import it into your Python scripts or interactive
environment (like Jupyter Notebooks) before you can use its functionality.
Basic Import:
import numpy
This imports the entire NumPy library, allowing you to access its functions, classes, and methods
by referencing numpy.
It’s common practice to import NumPy with the alias np, as it saves typing time and makes the
code cleaner.
import numpy as np
Here, numpy is imported and given the shorter alias np. Now, instead of typing numpy.array(),
you can simply write np.array().
• Readability and Convenience: NumPy functions are frequently used, so using the np
alias makes the code less verbose and easier to write.
• Consistency: In the Python community, np is a standard alias for NumPy. If you look at
tutorials, documentation, or projects, you’ll often see NumPy imported this way.
import numpy as np
# Creating a 1D array
print(arr)
arr_squared = np.square(arr)
print(arr_squared)
In this example:
We perform an operation (np.square()) on the array, which computes the square of each
element.
NDArray
An ndarray (short for N-dimensional array) is the core data structure in NumPy, a powerful
library for numerical computation in Python. It represents a multi-dimensional, homogeneous
array of fixed-size items, which allows for efficient storage and manipulation of numerical data.
Creating an ndarray
Using np.array():
import numpy as np
# Creating a 1D array
print(arr1)
# Creating a 2D array
print(arr2)
Output:
[1 2 3 4]
[[1 2]
[3 4]]
Properties of ndarray
Shape: The shape of an ndarray is a tuple of integers representing the size of each dimension.
arr2.shape # (2, 2)
arr2.size # 4
Data Type (dtype): NumPy arrays are homogeneous, meaning all elements are of the same type.
arr1.dtype # dtype('int64')
arr2.ndim # 2
Example:
print("Data type: ", arr.dtype) # Data type: int64 (or int32 depending on system)
print(arr_zeros)
# Output:
# [[0. 0. 0.]
# [0. 0. 0.]]
print(arr_range)
# Output: [0 2 4 6 8]
Operations on ndarray
You can access elements or subsets of an ndarray using indexing and slicing, similar to lists in
Python.
1D Array:
# Accessing elements
print(arr[1]) # Output: 20
# Slicing
arr_2d = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
# Accessing elements
print(arr_2d[1:, 1:])
# Output:
# [[50 60]
# [80 90]]
Broadcasting:
print(arr + 5)
# Output: [6 7 8]
The scalar 5 is broadcasted to each element in the array, and the operation is performed element-
wise.
Vectorization:
With ndarray, you can perform operations over the entire array without writing loops:
print(arr * 2)
# Output: [2 4 6]
Basic Operations
NumPy allows users to perform a variety of operations on arrays, both element-wise and matrix-wide, to
manipulate data efficiently.
1. Arithmetic Operations
These are element-wise operations, where each element in an array is operated upon
independently.
import numpy as np
print(result) # Output: [5 7 9]
print(result) # Output: [3 3 3]
result = arr1 ** 2
print(result) # Output: [1 4 9]
2. Comparison Operations
3. Aggregate Operations
NumPy provides several aggregate functions that operate over arrays to return a single value or a
new array.
• Sum (np.sum()): Computes the sum of all elements in the array. Can be used across an
entire array or along specific axes.
arr = np.array([[1, 2, 3], [4, 5, 6]])
result = np.sum(arr)
print(result) # Output: 21
• Axis-wise Sum: You can also compute sums along specific axes (rows or columns).
print(result) # Output: [5 7 9]
result = np.min(arr)
print(result) # Output: 1
result = np.max(arr)
print(result) # Output: 6
Indexing
Indexing is the method of accessing individual elements or groups of elements from an array. In
NumPy, arrays can be indexed using integers, slices, or boolean arrays.
A 1D array works similarly to a regular Python list. Elements are accessed by their index.
Example:
import numpy as np
print(arr[0]) # Output: 10
print(arr[4]) # Output: 50
You can also use negative indexing to access elements from the end of the array:
print(arr[-1]) # Output: 50
print(arr[-2]) # Output: 40
In a 2D array (matrix), you use two indices to access an element: one for the row and another for
the column.
Example:
In a 3D array, you need three indices to access an element: one for the depth, one for the row,
and one for the column.
Example:
Slicing
Slicing allows you to access a sub-array by specifying a range of indices. It is a powerful feature
for extracting subsets of data from arrays.
Slicing in 1D arrays works similarly to Python lists. You can specify a start, stop, and step in the
format arr[start:stop:step].
Example:
In a 2D array, you can slice both rows and columns. The format is arr[row_start:row_end,
col_start:col_end].
Example:
print(arr2D[1:, :2]) # Output: [[4 5], [7 8]] (Last two rows and first two columns)
print(arr[::-1]) # Output: [5 4 3 2 1]
Slicing Multiple Dimensions: You can combine slicing across multiple dimensions.
Iterating
Iteration refers to looping over the elements of an array. In NumPy, you can iterate through
arrays easily, and the behavior depends on the array’s dimensions.
Iterating over a 1D array is straightforward. Each iteration gives you one element of the array.
Example:
print(element)
# Output:
# 10 # 20 # 30
3.2 Iterating over 2D Arrays
When iterating over a 2D array, each iteration returns a 1D array corresponding to a row.
Example:
print(row)
# Output:
# [1 2 3]
# [4 5 6]
# [7 8 9]
print(element)
# Output:
#123456789
NumPy provides a powerful iterator called nditer for efficient iteration over arrays of any
dimension. This simplifies iteration over multi-dimensional arrays.
Example:
print(element)
# Output:
#123456
Conditions in NumPy:
Conditions in NumPy are similar to regular Python conditions but applied element-wise to
arrays. When a condition is applied to a NumPy array, it returns a Boolean array, where each
element is either True or False based on whether the condition is satisfied.
Example:
import numpy as np
print(condition)
Output:
In this case, the condition arr > 25 checks each element of the array, returning a Boolean array
where elements greater than 25 are marked as True.
A Boolean array is an array of the same shape as the original array but with Boolean values
(True or False). These are typically generated by applying comparison operators to the array.
Example:
print(bool_arr)
Output:
Here, the condition checks if each element is divisible by 2, returning True for even numbers and
False otherwise.
One of the most powerful uses of Boolean arrays in NumPy is masking, where you can filter an
array based on a condition. This technique allows you to select elements that satisfy the
condition and discard others.
Example:
print(filtered_arr)
Output:
[30 40 50]
Here, arr > 25 returns a Boolean array, and using this as an index, you can extract the values
from arr where the condition is True.
You can also combine multiple conditions using logical operators like & (and), | (or), and ~
(not).
Example:
print(filtered_arr)
Output:
[30 40]
Here, both conditions arr > 20 and arr < 50 are combined using the & operator, and the elements
satisfying both are returned.
Example:
Output:
True
False
Example:
print(arr)
Output:
[10 20 0 0 0]
Here, we replaced elements greater than 25 with 0 by applying the condition arr > 25 as a mask.
Shape Manipulation
shape manipulation is powerful techniques for modifying arrays without changing the underlying
data.
The shape of an array is a tuple representing the dimensions of the array (e.g., (rows, columns) in
2D arrays). Some common operations include reshaping, flattening, and transposing arrays.
a. reshape()
The reshape() function allows you to change the shape of an array without changing its data. You
specify a new shape, and NumPy rearranges the elements accordingly. However, the new shape
must be compatible with the original array's total number of elements.
import numpy as np
Output:
[[1 2]
[3 4]
[5 6]]
b. flatten()
flatten() collapses a multi-dimensional array into a 1D array. This can be useful when you
want to process or analyze data in a linear format.
Output:
[1 2 3 4 5 6]
c. transpose()
The transpose() function swaps the dimensions of an array, which is commonly used when
dealing with matrices (for instance, swapping rows and columns in 2D arrays).
Output:
[[1 4]
[2 5]
[3 6]]
Array Manipulation
Array manipulation allows for modifying and combining arrays by performing operations such
as splitting, stacking, and adding/removing elements.
a. concatenate()
Example:
print(concatenated)
Output:
[[1 2]
[3 4]
[5 6]]
Role of axis in Concatenation
When you concatenate arrays, you need to specify the axis along which the concatenation
happens.
• If axis=0, the arrays are concatenated along the rows (vertically). This means the arrays
are stacked one on top of the other, and their rows are added together.
• If axis=1, the arrays are concatenated along the columns (horizontally). This means the
arrays are placed side by side, and their columns are added together.
The shape of the arrays in the non-concatenated axes must be compatible for concatenation to
work. For instance, when concatenating along axis=0, the number of columns in the arrays must
be the same, and when concatenating along axis=1, the number of rows must be the same.
When axis=0, NumPy joins arrays by stacking them along the rows. Therefore, the arrays must
have the same number of columns.
import numpy as np
[4, 5, 6]])
print(concatenated)
Output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Here, the arrays are concatenated along the rows (vertically), resulting in a shape of (4, 3).
Example 2: Concatenation along axis=1 (Horizontal Stack)
When axis=1, NumPy joins arrays by stacking them along the columns. Therefore, the arrays
must have the same number of rows.
import numpy as np
[4, 5, 6]])
[9, 10]])
print(concatenated)
Output:
[[ 1 2 3 7 8]
[ 4 5 6 9 10]]
Here, the arrays are concatenated along the columns (horizontally), resulting in a shape of (2, 5).
b. stack()
stack() is used to join arrays along a new axis. Unlike concatenate(), which joins along an
existing axis, stack() adds a new dimension.
Syntax:
Example:
print(stacked)
Output:
[[1 3]
[2 4]]
Syntax:
numpy.hstack((array1, array2))
numpy.vstack((array1, array2))
Example:
Output:
Horizontal stack: [1 2 3 4]
Vertical stack:
[[1 2]
[3 4]]
d. split()
The split() function splits an array into multiple sub-arrays. You can specify either the number of
equally sized sub-arrays or the exact positions where the splits should happen.
Syntax:
Example:
split_arr = np.split(arr, 3)
print(split_arr)
Output:
Syntax:
numpy.append(array, values)
Example:
print(inserted) # [ 1 10 2 3]
f. delete()
Syntax:
Example:
print(deleted)
Output:
[1 3 5]
Structured Arrays
Structured arrays (also known as record arrays) in NumPy allow for heterogeneous data types
within one array.
This is different from standard NumPy arrays, which are homogenous (i.e., they contain only one
data type like integers or floats). With structured arrays, you can define fields with different data
types, making them similar to tables or records in databases.
These are useful when handling structured data like CSV files or databases where each column
can have different data types (integers, floats, strings, etc.).
Structured arrays are essentially arrays with a compound data type (a collection of other data
types), enabling you to access each field by name.
Creating Structured Arrays
You can create a structured array by specifying a dtype (data type) that consists of field names
and the corresponding data types for each field.
Let’s create a structured array for storing employee records with name, age, and salary.
import numpy as np
print(employees)
Output:
You can access each field (like a column in a table) by its name.
# Access the 'name' field (column)
print(employees['name'])
Output:
You can also access individual rows (records) like regular arrays:
print(employees[0])
Output:
print(employees[1]['salary'])
Output:
60000.0
You can dynamically add new records (rows) to a structured array using functions like
np.append().
print(updated_employees)
Output:
[('John', 28, 50000., [85., 90.]) ('Sara', 32, 60000., [88., 92.])
('Mike', 25, 45000., [80., 85.]) ('Tom', 29, 55000., [89., 85.])]
Advanced Features of Structured Arrays
You can access multiple fields (like selecting multiple columns in a table) by passing a list of
field names.
print(employees[['name', 'salary']])
Output:
You can sort structured arrays by one or more fields using np.sort() or np.argsort().
print(sorted_employees)
Output:
[('Mike', 25, 45000., [80., 85.]) ('John', 28, 50000., [85., 90.])
print(sorted_employees)
rec_employees = employees.view(np.recarray)
print(rec_employees.name)
Output:
This makes accessing fields more Pythonic, similar to how attributes are accessed in objects.
You can save and load structured arrays to and from files using functions like np.save() and
np.load(). Structured arrays can also be loaded from text files (CSV, TSV, etc.) using
np.genfromtxt() or np.loadtxt().
Saving to a File:
np.save('employees.npy', employees)
loaded_employees = np.load('employees.npy')
print(loaded_employees)
Data Processing: Structured arrays are ideal for reading and processing heterogeneous data,
especially when dealing with datasets that have various data types, like scientific data or
financial records.
NumPy has an efficient binary format called .npy for saving arrays to files. This format preserves
the shape, data type, and endianness of the array, making it fast and memory-efficient.
import numpy as np
# Create an array
np.save('array.npy', arr)
To load the array back from a .npy file, use the np.load() function.
loaded_arr = np.load('array.npy')
print(loaded_arr)
Output: [1 2 3 4 5]
2. Saving and Loading Multiple Arrays Using .npz Format
If you want to store multiple arrays in a single file, you can use the .npz format. This is a
compressed format that stores multiple arrays in one file, each identified by a name.
You can load the arrays back from the .npz file using np.load(). The arrays will be stored in a
dictionary-like object.
loaded_data = np.load('arrays.npz')
print(loaded_data['array1'])
print(loaded_data['array2'])
Output:
[1 2 3]
[4 5 6]
Sometimes, you may want to save array data in text format (like CSV or TSV files) for better
human readability or compatibility with other tools.
In this example:
print(loaded_arr)
Output:
[[1. 2. 3.]
[4. 5. 6.]]
You can also specify the data type using the dtype argument if the array contains other types like
integers or strings.
NumPy provides a way to handle CSV files, which is common for data storage. You can load
and save CSV files using genfromtxt() and savetxt().
The genfromtxt() function is useful when working with CSV files containing missing values,
headers, or non-numeric data.
# Load data from a CSV file with missing values
print(data)