0% found this document useful (0 votes)
15 views

Python 5th Sem

Uploaded by

dr.anuragkr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Python 5th Sem

Uploaded by

dr.anuragkr
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIT-2

THE NUMPY LIBRARY


NumPy (short for Numerical Python) is a powerful library in Python that provides support for
large, multi-dimensional arrays and matrices, along with a collection of mathematical functions
to operate on these arrays.

1. What is NumPy?

• NumPy is the fundamental package for numerical computing in Python.


• It allows users to work with arrays (N-dimensional arrays) that can store data more
efficiently than Python's built-in lists.
• NumPy is the building block for many other libraries such as Pandas (for data
manipulation) and Matplotlib (for plotting and visualization).
• By using NumPy, we can efficiently handle large datasets, perform complex
computations, and build powerful machine learning models or simulations. It's an
essential tool for anyone working in scientific computing, data analysis, or AI.

2. Why Use NumPy?

• Efficiency: Arrays in NumPy are much more efficient in terms of memory usage and
computational speed compared to Python lists.
• Mathematical Operations: NumPy provides a wide range of optimized mathematical
operations for arrays like addition, subtraction, matrix multiplication, and more.
• Support for Multi-dimensional Arrays: NumPy supports arrays of arbitrary dimensions
(1D, 2D, 3D, and beyond), making it suitable for scientific computing.
• Broadcasting and Vectorization: These features allow efficient element-wise operations
without the need for loops.

3. Key Features of NumPy:

• Ndarray (N-dimensional Array): The core object in NumPy is the ndarray, a


homogeneous array of fixed-size elements.
• Mathematical Functions: NumPy provides a large library of mathematical functions
such as sum(), mean(), max(), and linear algebra operations like matrix multiplication.
• Random Number Generation: NumPy has powerful capabilities for generating random
numbers for simulations or algorithms that require randomness.
• Broadcasting: This allows NumPy to perform arithmetic operations on arrays with
different shapes.

4. Applications of NumPy:

• Scientific Computing: Used for data analysis, scientific research, simulations, and
numerical computations.
• Machine Learning and AI: NumPy is heavily used in libraries like TensorFlow and
PyTorch, which are essential for deep learning and AI applications.
• Data Analysis: Libraries like Pandas, which are built on top of NumPy, use it to
manipulate and process data.
• Image and Signal Processing: NumPy is also employed in image processing tasks
where pixel data can be represented as arrays.

5. Getting Started with NumPy:

1. Installing NumPy

Using pip (Python's package installer):

The most common way to install NumPy is using pip, the Python package installer, which
downloads the latest version of NumPy from the Python Package Index (PyPI).

Steps for Installation:

Open Command Prompt or Terminal:

For Windows, you can use the Command Prompt or PowerShell.

On macOS or Linux, open the Terminal.

Use the pip command: If you have Python installed and pip is configured properly, you can
install NumPy with this command:

pip install numpy


After running this command, pip will fetch NumPy from the Python Package Index (PyPI) and
install it on your system.

If you are using Python 3, it might be pip3 instead of pip depending on your setup:

pip3 install numpy

Verifying the Installation: To check if NumPy has been installed successfully, open Python in
the command line (just type python or python3 in the terminal) and run the following code:

import numpy

print(numpy.__version__)

This will print the version of NumPy you have installed.

2. Importing NumPy in Python

Once NumPy is installed, you need to import it into your Python scripts or interactive
environment (like Jupyter Notebooks) before you can use its functionality.

Basic Import:

import numpy

This imports the entire NumPy library, allowing you to access its functions, classes, and methods
by referencing numpy.

Alias Import (Common Practice):

It’s common practice to import NumPy with the alias np, as it saves typing time and makes the
code cleaner.

import numpy as np

Here, numpy is imported and given the shorter alias np. Now, instead of typing numpy.array(),
you can simply write np.array().

Why Use an Alias?

• Readability and Convenience: NumPy functions are frequently used, so using the np
alias makes the code less verbose and easier to write.
• Consistency: In the Python community, np is a standard alias for NumPy. If you look at
tutorials, documentation, or projects, you’ll often see NumPy imported this way.

Example of Importing and Using NumPy:

import numpy as np

# Creating a 1D array

arr = np.array([1, 2, 3, 4, 5])

print(arr)

# Performing a basic operation

arr_squared = np.square(arr)

print(arr_squared)

In this example:

We import NumPy with the alias np.

We create a NumPy array using np.array().

We perform an operation (np.square()) on the array, which computes the square of each
element.

NDArray

An ndarray (short for N-dimensional array) is the core data structure in NumPy, a powerful
library for numerical computation in Python. It represents a multi-dimensional, homogeneous
array of fixed-size items, which allows for efficient storage and manipulation of numerical data.

Key Features of NDArray:


• Homogeneous: All elements in an ndarray must be of the same data type.
• Fixed Size: The size of an ndarray is fixed at the time of its creation.
• Dimensions: It can have one or more dimensions (1D, 2D, 3D, etc.), and each dimension
is called an axis.
• Shape: The shape of an ndarray is a tuple that represents the size of the array along each
axis. For example, a 3x4 array has a shape of (3, 4).
• Efficient Operations: NumPy arrays enable vectorized operations, meaning you can
apply operations to the entire array at once without the need for loops.
• Efficient Storage: Memory is allocated contiguously, which optimizes operations like
slicing, broadcasting, and vectorization.
• Element-wise Operations: Mathematical operations can be applied element-wise,
making computations faster and more efficient.

Creating an ndarray

You can create an ndarray using several methods in NumPy:

Using np.array():

import numpy as np

# Creating a 1D array

arr1 = np.array([1, 2, 3, 4])

print(arr1)

# Creating a 2D array

arr2 = np.array([[1, 2], [3, 4]])

print(arr2)

Output:

[1 2 3 4]

[[1 2]

[3 4]]
Properties of ndarray

Shape: The shape of an ndarray is a tuple of integers representing the size of each dimension.

arr2.shape # (2, 2)

Size: The number of elements in the array.

arr2.size # 4

Data Type (dtype): NumPy arrays are homogeneous, meaning all elements are of the same type.

arr1.dtype # dtype('int64')

Dimensions (ndim): Returns the number of dimensions.

arr2.ndim # 2

Example:

arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Shape: ", arr.shape) # Shape: (2, 3)

print("Size: ", arr.size) # Size: 6

print("Number of dimensions: ", arr.ndim) # Number of dimensions: 2

print("Data type: ", arr.dtype) # Data type: int64 (or int32 depending on system)

Common Functions to Create Arrays:

NumPy provides several functions to quickly create arrays:

np.zeros(shape): Creates an array filled with zeros.

np.ones(shape): Creates an array filled with ones.


np.arange(start, stop, step): Creates an array with a range of values.

np.linspace(start, stop, num): Creates an array with evenly spaced values.

arr_zeros = np.zeros((2, 3))

print(arr_zeros)

# Output:

# [[0. 0. 0.]

# [0. 0. 0.]]

arr_range = np.arange(0, 10, 2)

print(arr_range)

# Output: [0 2 4 6 8]

Operations on ndarray

Indexing and Slicing

You can access elements or subsets of an ndarray using indexing and slicing, similar to lists in
Python.

1D Array:

arr = np.array([10, 20, 30, 40, 50])

# Accessing elements

print(arr[1]) # Output: 20

# Slicing

print(arr[1:4]) # Output: [20 30 40]


2D Array:

arr_2d = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])

# Accessing elements

print(arr_2d[1, 2]) # Output: 60

# Slicing rows and columns

print(arr_2d[1:, 1:])

# Output:

# [[50 60]

# [80 90]]

Broadcasting:

NumPy uses a technique called broadcasting to perform element-wise operations on arrays of


different shapes. For example:

arr = np.array([1, 2, 3])

print(arr + 5)

# Output: [6 7 8]

The scalar 5 is broadcasted to each element in the array, and the operation is performed element-
wise.

Vectorization:

With ndarray, you can perform operations over the entire array without writing loops:

arr = np.array([1, 2, 3])

print(arr * 2)

# Output: [2 4 6]
Basic Operations
NumPy allows users to perform a variety of operations on arrays, both element-wise and matrix-wide, to
manipulate data efficiently.

1. Arithmetic Operations

These are element-wise operations, where each element in an array is operated upon
independently.

• Addition (+): Adds corresponding elements of two arrays.

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

result = arr1 + arr2

print(result) # Output: [5 7 9]

• Subtraction (-): Subtracts corresponding elements of one array from another.

result = arr2 - arr1

print(result) # Output: [3 3 3]

• Multiplication (*): Multiplies corresponding elements of two arrays.

result = arr1 * arr2

print(result) # Output: [ 4 10 18]

• Division (/): Divides corresponding elements of one array by another.

result = arr2 / arr1

print(result) # Output: [4. 2.5 2. ]


• Power (**): Raises each element in one array to the power of the corresponding element
in another array.

result = arr1 ** 2

print(result) # Output: [1 4 9]

2. Comparison Operations

These operations return boolean arrays based on element-wise comparisons.

• Greater Than (>): Compares elements of one array to another or a scalar.

result = arr1 > 2

print(result) # Output: [False False True]

• Less Than (<): Compares elements of one array to another or a scalar.

result = arr2 < 5

print(result) # Output: [ True True False]

• Equal To (==): Checks if corresponding elements in two arrays are equal.

result = arr1 == arr2

print(result) # Output: [False False False]

3. Aggregate Operations

NumPy provides several aggregate functions that operate over arrays to return a single value or a
new array.

• Sum (np.sum()): Computes the sum of all elements in the array. Can be used across an
entire array or along specific axes.
arr = np.array([[1, 2, 3], [4, 5, 6]])

result = np.sum(arr)

print(result) # Output: 21

• Axis-wise Sum: You can also compute sums along specific axes (rows or columns).

result = np.sum(arr, axis=0) # Sum along columns

print(result) # Output: [5 7 9]

• Minimum (np.min()): Returns the minimum value of an array.

result = np.min(arr)

print(result) # Output: 1

• Maximum (np.max()): Returns the maximum value of an array.

result = np.max(arr)

print(result) # Output: 6

Indexing

Indexing is the method of accessing individual elements or groups of elements from an array. In
NumPy, arrays can be indexed using integers, slices, or boolean arrays.

1.1 Indexing a 1D Array

A 1D array works similarly to a regular Python list. Elements are accessed by their index.
Example:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

print(arr[0]) # Output: 10

print(arr[4]) # Output: 50

You can also use negative indexing to access elements from the end of the array:

print(arr[-1]) # Output: 50

print(arr[-2]) # Output: 40

1.2 Indexing a 2D Array

In a 2D array (matrix), you use two indices to access an element: one for the row and another for
the column.

Example:

arr2D = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr2D[0, 2]) # Output: 3 (1st row, 3rd column)

print(arr2D[2, 1]) # Output: 8 (3rd row, 2nd column)

Using Negative Indexing:

print(arr2D[-1, -2]) # Output: 8 (Last row, 2nd last column)

1.3 Indexing a 3D Array

In a 3D array, you need three indices to access an element: one for the depth, one for the row,
and one for the column.

Example:

arr3D = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])


print(arr3D[0, 1, 0]) # Output: 3

print(arr3D[1, 0, 1]) # Output: 6

Slicing
Slicing allows you to access a sub-array by specifying a range of indices. It is a powerful feature
for extracting subsets of data from arrays.

2.1 Slicing a 1D Array

Slicing in 1D arrays works similarly to Python lists. You can specify a start, stop, and step in the
format arr[start:stop:step].

Example:

arr = np.array([10, 20, 30, 40, 50])

print(arr[1:4]) # Output: [20 30 40]

print(arr[:3]) # Output: [10 20 30]

print(arr[::2]) # Output: [10 30 50] (Every second element)

2.2 Slicing a 2D Array

In a 2D array, you can slice both rows and columns. The format is arr[row_start:row_end,
col_start:col_end].

Example:

arr2D = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(arr2D[1:, :2]) # Output: [[4 5], [7 8]] (Last two rows and first two columns)

You can slice along a specific axis. For instance:

Extracting a specific row:

print(arr2D[1, :]) # Output: [4 5 6] (Second row)


Extracting a specific column:

print(arr2D[:, 2]) # Output: [3 6 9] (Third column)

2.3 Advanced Slicing Techniques

Reversing an Array: You can reverse an array by specifying a negative step.

arr = np.array([1, 2, 3, 4, 5])

print(arr[::-1]) # Output: [5 4 3 2 1]

Slicing Multiple Dimensions: You can combine slicing across multiple dimensions.

arr3D = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print(arr3D[:, 1, :]) # Output: [[3 4], [7 8]]

Iterating
Iteration refers to looping over the elements of an array. In NumPy, you can iterate through
arrays easily, and the behavior depends on the array’s dimensions.

3.1 Iterating over 1D Arrays

Iterating over a 1D array is straightforward. Each iteration gives you one element of the array.

Example:

arr = np.array([10, 20, 30])

for element in arr:

print(element)

# Output:

# 10 # 20 # 30
3.2 Iterating over 2D Arrays

When iterating over a 2D array, each iteration returns a 1D array corresponding to a row.

Example:

arr2D = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

for row in arr2D:

print(row)

# Output:

# [1 2 3]

# [4 5 6]

# [7 8 9]

To iterate over each element, you can use nested loops:

for row in arr2D:

for element in row:

print(element)

# Output:

#123456789

3.3 Iterating with nditer

NumPy provides a powerful iterator called nditer for efficient iteration over arrays of any
dimension. This simplifies iteration over multi-dimensional arrays.

Example:

arr2D = np.array([[1, 2, 3], [4, 5, 6]])

for element in np.nditer(arr2D):

print(element)
# Output:

#123456

Conditions and Boolean Arrays


In the NumPy library, conditions and Boolean arrays are used extensively to handle arrays and
manipulate data in a highly efficient manner.

Conditions in NumPy:

Conditions in NumPy are similar to regular Python conditions but applied element-wise to
arrays. When a condition is applied to a NumPy array, it returns a Boolean array, where each
element is either True or False based on whether the condition is satisfied.

Example:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

condition = arr > 25

print(condition)

Output:

[False False True True True]

In this case, the condition arr > 25 checks each element of the array, returning a Boolean array
where elements greater than 25 are marked as True.

Boolean Arrays in NumPy:

A Boolean array is an array of the same shape as the original array but with Boolean values
(True or False). These are typically generated by applying comparison operators to the array.

Example:

bool_arr = arr % 2 == 0 # Check if elements are even

print(bool_arr)
Output:

[ True True False True False]

Here, the condition checks if each element is divisible by 2, returning True for even numbers and
False otherwise.

Boolean Arrays for Filtering (Masking):

One of the most powerful uses of Boolean arrays in NumPy is masking, where you can filter an
array based on a condition. This technique allows you to select elements that satisfy the
condition and discard others.

Example:

filtered_arr = arr[arr > 25] # Select elements greater than 25

print(filtered_arr)

Output:

[30 40 50]

Here, arr > 25 returns a Boolean array, and using this as an index, you can extract the values
from arr where the condition is True.

Combining Multiple Conditions:

You can also combine multiple conditions using logical operators like & (and), | (or), and ~
(not).

Example:

arr = np.array([10, 20, 30, 40, 50])

# Elements greater than 20 and less than 50

filtered_arr = arr[(arr > 20) & (arr < 50)]

print(filtered_arr)

Output:

[30 40]
Here, both conditions arr > 20 and arr < 50 are combined using the & operator, and the elements
satisfying both are returned.

Boolean Array Methods in NumPy:


NumPy provides several built-in methods that work well with Boolean arrays:

np.any(): Returns True if at least one element is True.

np.all(): Returns True if all elements are True.

Example:

arr = np.array([10, 20, 30, 40, 50])

condition = arr > 15

print(np.any(condition)) # True if any element satisfies the condition

print(np.all(condition)) # True if all elements satisfy the condition

Output:

True

False

Boolean Array as a Mask:

You can use Boolean arrays to modify or mask elements in an array.

Example:

arr = np.array([10, 20, 30, 40, 50])

arr[arr > 25] = 0 # Replace all elements greater than 25 with 0

print(arr)

Output:
[10 20 0 0 0]

Here, we replaced elements greater than 25 with 0 by applying the condition arr > 25 as a mask.

Shape Manipulation
shape manipulation is powerful techniques for modifying arrays without changing the underlying
data.

Shape manipulation refers to changing the shape or structure of a NumPy array.

The shape of an array is a tuple representing the dimensions of the array (e.g., (rows, columns) in
2D arrays). Some common operations include reshaping, flattening, and transposing arrays.

a. reshape()

The reshape() function allows you to change the shape of an array without changing its data. You
specify a new shape, and NumPy rearranges the elements accordingly. However, the new shape
must be compatible with the original array's total number of elements.

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])


reshaped = arr.reshape(3, 2)
print(reshaped)

Output:

[[1 2]
[3 4]
[5 6]]

b. flatten()

flatten() collapses a multi-dimensional array into a 1D array. This can be useful when you
want to process or analyze data in a linear format.

arr = np.array([[1, 2, 3], [4, 5, 6]])


flattened = arr.flatten()
print(flattened)

Output:

[1 2 3 4 5 6]
c. transpose()

The transpose() function swaps the dimensions of an array, which is commonly used when
dealing with matrices (for instance, swapping rows and columns in 2D arrays).

arr = np.array([[1, 2, 3], [4, 5, 6]])


transposed = np.transpose(arr)
print(transposed)

Output:

[[1 4]

[2 5]

[3 6]]

Array Manipulation
Array manipulation allows for modifying and combining arrays by performing operations such
as splitting, stacking, and adding/removing elements.

a. concatenate()

concatenate() joins two or more arrays along an existing axis.

numpy.concatenate((array1, array2), axis=0)

Example:

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6]])

concatenated = np.concatenate((arr1, arr2), axis=0)

print(concatenated)

Output:

[[1 2]

[3 4]

[5 6]]
Role of axis in Concatenation

When you concatenate arrays, you need to specify the axis along which the concatenation
happens.

• If axis=0, the arrays are concatenated along the rows (vertically). This means the arrays
are stacked one on top of the other, and their rows are added together.
• If axis=1, the arrays are concatenated along the columns (horizontally). This means the
arrays are placed side by side, and their columns are added together.

The shape of the arrays in the non-concatenated axes must be compatible for concatenation to
work. For instance, when concatenating along axis=0, the number of columns in the arrays must
be the same, and when concatenating along axis=1, the number of rows must be the same.

Example 1: Concatenation along axis=0 (Vertical Stack)

When axis=0, NumPy joins arrays by stacking them along the rows. Therefore, the arrays must
have the same number of columns.

import numpy as np

arr1 = np.array([[1, 2, 3],

[4, 5, 6]])

arr2 = np.array([[7, 8, 9],

[10, 11, 12]])

# Concatenate along axis 0 (rows)

concatenated = np.concatenate((arr1, arr2), axis=0)

print(concatenated)

Output:

[[ 1 2 3]

[ 4 5 6]

[ 7 8 9]

[10 11 12]]

Here, the arrays are concatenated along the rows (vertically), resulting in a shape of (4, 3).
Example 2: Concatenation along axis=1 (Horizontal Stack)

When axis=1, NumPy joins arrays by stacking them along the columns. Therefore, the arrays
must have the same number of rows.

import numpy as np

arr1 = np.array([[1, 2, 3],

[4, 5, 6]])

arr2 = np.array([[7, 8],

[9, 10]])

# Concatenate along axis 1 (columns)

concatenated = np.concatenate((arr1, arr2), axis=1)

print(concatenated)

Output:

[[ 1 2 3 7 8]

[ 4 5 6 9 10]]

Here, the arrays are concatenated along the columns (horizontally), resulting in a shape of (2, 5).

b. stack()

stack() is used to join arrays along a new axis. Unlike concatenate(), which joins along an
existing axis, stack() adds a new dimension.

Syntax:

numpy.stack((array1, array2), axis=0)

Example:

arr1 = np.array([1, 2])

arr2 = np.array([3, 4])


stacked = np.stack((arr1, arr2), axis=1)

print(stacked)

Output:

[[1 3]

[2 4]]

c. hstack() and vstack()

hstack() joins arrays horizontally (along columns).

vstack() joins arrays vertically (along rows).

Syntax:

numpy.hstack((array1, array2))

numpy.vstack((array1, array2))

Example:

arr1 = np.array([1, 2])

arr2 = np.array([3, 4])

h_stacked = np.hstack((arr1, arr2))

v_stacked = np.vstack((arr1, arr2))

print(h_stacked) # Horizontal stack

print(v_stacked) # Vertical stack

Output:

Horizontal stack: [1 2 3 4]

Vertical stack:

[[1 2]
[3 4]]

d. split()

The split() function splits an array into multiple sub-arrays. You can specify either the number of
equally sized sub-arrays or the exact positions where the splits should happen.

Syntax:

numpy.split(array, sections, axis=0)

Example:

arr = np.array([1, 2, 3, 4, 5, 6])

split_arr = np.split(arr, 3)

print(split_arr)

Output:

[array([1, 2]), array([3, 4]), array([5, 6])]

e. append() and insert()

append() adds values to the end of an array.

insert() inserts values at specific positions.

Syntax:

numpy.append(array, values)

numpy.insert(array, index, values)

Example:

arr = np.array([1, 2, 3])

appended = np.append(arr, [4, 5])

inserted = np.insert(arr, 1, 10)


print(appended) # [1 2 3 4 5]

print(inserted) # [ 1 10 2 3]

f. delete()

The delete() function removes elements at specified indices.

Syntax:

numpy.delete(array, index, axis=None)

Example:

arr = np.array([1, 2, 3, 4, 5])

deleted = np.delete(arr, [1, 3])

print(deleted)

Output:

[1 3 5]

Structured Arrays
Structured arrays (also known as record arrays) in NumPy allow for heterogeneous data types
within one array.

This is different from standard NumPy arrays, which are homogenous (i.e., they contain only one
data type like integers or floats). With structured arrays, you can define fields with different data
types, making them similar to tables or records in databases.

These are useful when handling structured data like CSV files or databases where each column
can have different data types (integers, floats, strings, etc.).

Structured arrays are essentially arrays with a compound data type (a collection of other data
types), enabling you to access each field by name.
Creating Structured Arrays

You can create a structured array by specifying a dtype (data type) that consists of field names
and the corresponding data types for each field.

Example 1: Simple Structured Array

Let’s create a structured array for storing employee records with name, age, and salary.

import numpy as np

# Define a structured dtype

dtype = [('name', 'U10'), ('age', 'i4'), ('salary', 'f8')]

# Create a structured array

employees = np.array([('John', 28, 50000.0),

('Sara', 32, 60000.0),

('Mike', 25, 45000.0)], dtype=dtype)

print(employees)

Output:

[('John', 28, 50000.) ('Sara', 32, 60000.) ('Mike', 25, 45000.)]

Here, the dtype specifies:

name is a Unicode string of length 10 ('U10'),

age is a 4-byte integer ('i4'),

salary is an 8-byte floating-point number ('f8').

Accessing Fields in a Structured Array

You can access each field (like a column in a table) by its name.
# Access the 'name' field (column)

print(employees['name'])

Output:

['John' 'Sara' 'Mike']

You can also access individual rows (records) like regular arrays:

# Access the first record (row)

print(employees[0])

Output:

('John', 28, 50000.)

You can combine both to access a specific field of a specific record:

# Access the 'salary' of the second record

print(employees[1]['salary'])

Output:

60000.0

Adding New Records

You can dynamically add new records (rows) to a structured array using functions like
np.append().

new_employee = np.array([('Tom', 29, 55000.0, [89.0, 85.0])], dtype=dtype)

updated_employees = np.append(employees, new_employee)

print(updated_employees)

Output:

[('John', 28, 50000., [85., 90.]) ('Sara', 32, 60000., [88., 92.])

('Mike', 25, 45000., [80., 85.]) ('Tom', 29, 55000., [89., 85.])]
Advanced Features of Structured Arrays

1. Accessing Multiple Fields Simultaneously

You can access multiple fields (like selecting multiple columns in a table) by passing a list of
field names.

# Access both 'name' and 'salary' fields

print(employees[['name', 'salary']])

Output:

[('John', 50000.) ('Sara', 60000.) ('Mike', 45000.)]

2. Sorting Structured Arrays

You can sort structured arrays by one or more fields using np.sort() or np.argsort().

# Sort employees by 'salary'

sorted_employees = np.sort(employees, order='salary')

print(sorted_employees)

Output:

[('Mike', 25, 45000., [80., 85.]) ('John', 28, 50000., [85., 90.])

('Sara', 32, 60000., [88., 92.])]

You can also sort by multiple fields:

# Sort by 'age' and then by 'salary'

sorted_employees = np.sort(employees, order=['age', 'salary'])

print(sorted_employees)

3. Using np.recarray for Attribute Access


The np.recarray object is an enhanced version of structured arrays that allows accessing fields as
attributes.

# Convert structured array to recarray

rec_employees = employees.view(np.recarray)

# Access 'name' field using dot notation

print(rec_employees.name)

Output:

['John' 'Sara' 'Mike']

This makes accessing fields more Pythonic, similar to how attributes are accessed in objects.

Loading and Saving Structured Arrays

You can save and load structured arrays to and from files using functions like np.save() and
np.load(). Structured arrays can also be loaded from text files (CSV, TSV, etc.) using
np.genfromtxt() or np.loadtxt().

Saving to a File:

np.save('employees.npy', employees)

Loading from a File:

loaded_employees = np.load('employees.npy')

print(loaded_employees)

Applications of Structured Arrays

Data Processing: Structured arrays are ideal for reading and processing heterogeneous data,
especially when dealing with datasets that have various data types, like scientific data or
financial records.

• Simulation: In simulations, structured arrays can store properties of different objects or


entities with varying types.
• Database Interaction: They act like rows in a database, making it easier to represent
records with fields of different data types.
• CSV File Handling: Structured arrays are useful when working with CSV files, where
each column can have different data types.

Reading and writing Array Data on Files


In NumPy, reading and writing array data to and from files is a common task, especially when
working with large datasets. NumPy provides several methods to save arrays in different formats
and to load them efficiently from files.

1. Saving and Loading Binary Data Using .npy Format

NumPy has an efficient binary format called .npy for saving arrays to files. This format preserves
the shape, data type, and endianness of the array, making it fast and memory-efficient.

Saving a NumPy Array to .npy File

To save an array in .npy format, use the np.save() function.

import numpy as np

# Create an array

arr = np.array([1, 2, 3, 4, 5])

# Save the array to a file

np.save('array.npy', arr)

This creates a binary file array.npy that stores the array.

Loading a NumPy Array from .npy File

To load the array back from a .npy file, use the np.load() function.

# Load the array from the file

loaded_arr = np.load('array.npy')

print(loaded_arr)

Output: [1 2 3 4 5]
2. Saving and Loading Multiple Arrays Using .npz Format

If you want to store multiple arrays in a single file, you can use the .npz format. This is a
compressed format that stores multiple arrays in one file, each identified by a name.

Saving Multiple Arrays to .npz File

You can use np.savez() or np.savez_compressed() to save multiple arrays.

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

# Save multiple arrays to a .npz file

np.savez('arrays.npz', array1=arr1, array2=arr2)

Loading Arrays from .npz File

You can load the arrays back from the .npz file using np.load(). The arrays will be stored in a
dictionary-like object.

# Load multiple arrays from the .npz file

loaded_data = np.load('arrays.npz')

# Access arrays by name

print(loaded_data['array1'])

print(loaded_data['array2'])

Output:

[1 2 3]

[4 5 6]

3. Saving and Loading Text Files

Sometimes, you may want to save array data in text format (like CSV or TSV files) for better
human readability or compatibility with other tools.

Saving an Array to a Text File


You can save arrays as text using np.savetxt(). This is commonly used for storing arrays in CSV
format or space-separated values.

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Save array as a CSV file

np.savetxt('array.csv', arr, delimiter=',', fmt='%d')

In this example:

delimiter=',' specifies that the values should be comma-separated (CSV format).

fmt='%d' specifies that the numbers should be written as integers.

Loading an Array from a Text File

You can load arrays from a text file using np.loadtxt().

# Load array from a CSV file

loaded_arr = np.loadtxt('array.csv', delimiter=',')

print(loaded_arr)

Output:

[[1. 2. 3.]

[4. 5. 6.]]

You can also specify the data type using the dtype argument if the array contains other types like
integers or strings.

4. Working with CSV Files (Comma-Separated Values)

NumPy provides a way to handle CSV files, which is common for data storage. You can load
and save CSV files using genfromtxt() and savetxt().

Loading CSV Files with genfromtxt()

The genfromtxt() function is useful when working with CSV files containing missing values,
headers, or non-numeric data.
# Load data from a CSV file with missing values

data = np.genfromtxt('data.csv', delimiter=',', skip_header=1, filling_values=-1)

print(data)

skip_header=1: Skips the first row (usually the header row).

filling_values=-1: Fills missing values with -1.

Saving CSV Files with savetxt()

To save NumPy arrays as CSV files, you can use savetxt().

# Save array to CSV file

np.savetxt('data.csv', arr, delimiter=',', header='Col1,Col2,Col3', comments='')

header='Col1,Col2,Col3': Adds column headers to the CSV file.

comments='': Prevents # from being added to the header.

You might also like