0% found this document useful (0 votes)
16 views30 pages

03 Numpy

NumPy is a Python library essential for numerical computing, providing support for multi-dimensional arrays and various mathematical operations. Key features include performance optimization, broadcasting, and integration with other libraries, making it faster and more memory-efficient than Python lists. The document also covers basic usage, array types, operations, and key attributes of NumPy arrays.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views30 pages

03 Numpy

NumPy is a Python library essential for numerical computing, providing support for multi-dimensional arrays and various mathematical operations. Key features include performance optimization, broadcasting, and integration with other libraries, making it faster and more memory-efficient than Python lists. The document also covers basic usage, array types, operations, and key attributes of NumPy arrays.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

NumPy

Overview of NumPy and Its Usage

What is NumPy?
NumPy (Numerical Python) is a Python library used for numerical computing. It provides
support for multi-dimensional arrays, mathematical operations, and linear algebra, making it
essential for scientific computing, data analysis, and machine learning.

Key Features of NumPy:


• N-dimensional array (ndarray) – Supports multi-dimensional arrays with fast operations.
• Mathematical functions – Includes functions for arithmetic, trigonometry, statistics, and
algebra.
• Broadcasting – Enables element-wise operations on arrays of different shapes.
• Linear algebra operations – Supports matrix manipulations, dot products, and eigenvalues.
• Performance optimization – Faster than Python lists due to efficient memory management
and vectorized operations.

Why Use NumPy?


• Faster than Python lists – Uses optimized C-based implementation.
• Consumes less memory – Stores elements efficiently using fixed-size data types.
• Vectorized operations – Eliminates the need for slow Python loops.
• Integration with other libraries – Used in Pandas, SciPy, TensorFlow, and other data
science tools.

Basic Usage of NumPy


1. Importing NumPy
import numpy as np

2. Creating NumPy Arrays


arr1 = np.array([1, 2, 3, 4, 5]) # 1D Array
arr2 = np.array([[1, 2, 3], [4, 5, 6]]) # 2D Array (Matrix)

print(arr1)
print(arr2)

3. Checking Array Shape and Type


print(arr1.shape) # Output: (5,)
print(arr2.shape) # Output: (2, 3)
print(arr1.dtype) # Output: int64 (or int32 depending on system)

4. Mathematical Operations on Arrays


arr = np.array([10, 20, 30])
print(arr * 2) # Output: [20 40 60]
print(np.sqrt(arr)) # Output: [3.16 4.47 5.47] (Square root)

5. Creating Special Arrays


zeros_array = np.zeros((3, 3)) # 3x3 array of zeros
ones_array = np.ones((2, 2)) # 2x2 array of ones
random_array = np.random.rand(3, 3) # 3x3 array of random values

print(zeros_array)
print(ones_array)
print(random_array)

6. Statistical Functions
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr)) # Output: 3.0 (Mean)
print(np.median(arr)) # Output: 3.0 (Median)
print(np.std(arr)) # Output: 1.414 (Standard Deviation)

7. Matrix Operations (Linear Algebra)


A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print(np.dot(A, B)) # Matrix multiplication


print(np.linalg.inv(A)) # Inverse of a matrix

What Are Arrays?


An array is a collection of elements stored in a structured way. It is used to store multiple values
of the same data type in a single variable, making it more efficient than lists when working with
large amounts of data.
In Python, arrays are commonly implemented using NumPy arrays (ndarray) because they
provide faster and more memory-efficient operations than Python lists.

Key Features of Arrays:


1. Fixed Size – Arrays have a fixed number of elements.
2. Efficient Memory Usage – Stores elements in contiguous memory, making operations
faster.
3. Supports Vectorized Operations – Perform element-wise operations without loops.
4. Multi-dimensional Support – Can represent 1D, 2D (matrices), and multi-dimensional
data structures.
Types of Arrays in NumPy:
1. 1D Array (Vector)
A one-dimensional array is a simple list of elements.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(arr.shape) # Output: (5,) → 1D array with 5 elements

2. 2D Array (Matrix)
A two-dimensional array represents data in rows and columns.
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix)
print(matrix.shape) # Output: (2, 3) → 2 rows, 3 columns

3. 3D Array (Tensor)
A three-dimensional array is useful for handling multi-layered data, like images.
tensor = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(tensor.shape) # Output: (2, 2, 2) → 2 layers, 2 rows, 2 columns

Array Operations in NumPy


1. Mathematical Operations on Arrays
arr = np.array([10, 20, 30])
print(arr * 2) # Output: [20 40 60] → Element-wise multiplication
print(arr + 5) # Output: [15 25 35] → Element-wise addition

2. Checking Array Properties


print(arr.shape) # Number of elements in each dimension
print(arr.size) # Total number of elements
print(arr.dtype) # Data type of elements

3. Creating Special Arrays


zeros = np.zeros((3, 3)) # 3x3 matrix of zeros
ones = np.ones((2, 2)) # 2x2 matrix of ones
identity = np.eye(3) # 3x3 Identity matrix

Why Use NumPy Arrays Instead of Python Lists?


Feature NumPy Arrays Python Lists
Speed Faster (C-based implementation) Slower
Memory Usage More efficient (fixed data type) Higher memory usage
Feature NumPy Arrays Python Lists
Operations Supports vectorized operations Requires loops
Multi-dimensional Support Yes No (nested lists are required)

Difference Between List and ndArray


Feature NumPy Array Python List
Data Type Homogeneous (same type) Heterogeneous (different types
allowed)
Memory Efficiency More memory efficient Less memory efficient (pointer-
(contiguous memory) based)
Performance Faster for numerical operations Slower for numerical operations
Operations Supports vectorized (element- No built-in element-wise operations
wise) operations
Size Fixed size (cannot change once Dynamic size (can add/remove
created) elements)
Multi-dimensional Supports 1D, 2D, 3D arrays Limited to 1D (nested lists for 2D,
Support 3D)
Indexing and Slicing Advanced (fancy, boolean Basic indexing and slicing
indexing)
Mathematical Functions Extensive built-in functions Limited mathematical functions
Use Case Numerical/scientific computing General-purpose programming

A common mistake occurs while passing argument to array()


Forgetting to put square brackets.
Make sure only a single argument containing list of values is passed.
#incorrect way
a = np.array(1,2,3,4)
correct way
a = np.array([1,2,3,4])
Key Attributes of ndarray in NumPy
A NumPy ndarray (N-dimensional array) has several key attributes that provide information about
its structure, size, and data type.

1. ndarray.shape → Shape of the Array


📌 Returns a tuple representing the dimensions of the array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # Output: (2, 3) → 2 rows, 3 columns

2. ndarray.ndim → Number of Dimensions


📌 Returns the number of axes (dimensions) of the array.
print(arr.ndim) # Output: 2 (because it’s a 2D array)

3. ndarray.size → Total Number of Elements


📌 Returns the total count of elements in the array.
print(arr.size) # Output: 6 (2x3 = 6 elements)

4. ndarray.dtype → Data Type of Elements


📌 Returns the data type of the array’s elements.
print(arr.dtype) # Output: int64 (or int32 based on the system)

🔹 Supports various data types: int32, int64, float32, float64, complex, etc.
5. ndarray.itemsize → Memory Size of Each Element
📌 Returns the size (in bytes) of one element in the array.
print(arr.itemsize) # Output: 8 (for int64), 4 (for int32)

🔹 Helps in memory optimization and storage calculations.


6. ndarray.nbytes → Total Memory Used
📌 Returns the total memory (in bytes) occupied by the array.
print(arr.nbytes) # Output: 48 (6 elements * 8 bytes each = 48 bytes)
7. ndarray.T → Transpose of the Array
📌 Returns the transposed version of the array (rows ↔ columns).
print(arr.T)
# Output:
# [[1 4]
# [2 5]
# [3 6]]

Summary of Key Attributes


Attribute Description
shape Tuple of array dimensions (rows, columns, depth)
ndim Number of dimensions (axes)
size Total number of elements in the array
dtype Data type of elements (e.g., int32, float64)
itemsize Memory size of each element (in bytes)
nbytes Total memory occupied (size × itemsize)
T Transposed version of the array
Conclusion: These attributes help in efficiently managing and analyzing NumPy arrays, making
them powerful tools for data science and numerical computing!
Data Types in NumPy (dtype)
NumPy provides various data types (dtype) to store different kinds of numerical and textual data
efficiently.

1. Numeric Data Types


A. Integer Types
Data Type Description Memory
int8 8-bit signed integer (-128 to 127) 1 byte
int16 16-bit signed integer (-32,768 to 32,767) 2 bytes
int32 32-bit signed integer (-2.1B to 2.1B) 4 bytes

🔹
int64 64-bit signed integer (large range) 8 bytes
Example:
import numpy as np
arr = np.array([1, 2, 3], dtype=np.int32)
print(arr.dtype) # Output: int32

B. Floating-Point Types
Data Type Description Memory
float16 16-bit floating point 2 bytes
float32 32-bit floating point 4 bytes

🔹
float64 64-bit floating point (default) 8 bytes
Example:
arr = np.array([1.2, 2.3, 3.4], dtype=np.float32)
print(arr.dtype) # Output: float32

2. Boolean Type
Data Type Description

🔹
bool_ Boolean (True or False)
Example:
arr = np.array([True, False, True], dtype=np.bool_)
print(arr.dtype) # Output: bool
3. Complex Number Type
Data Type Description Memory
complex64 Complex number with 32-bit floats 8 bytes

🔹
complex128 Complex number with 64-bit floats 16 bytes
Example:
arr = np.array([1+2j, 3+4j], dtype=np.complex64)
print(arr.dtype) # Output: complex64

4. String Type
Data Type Description

🔹
str_ Fixed-size Unicode string
Example:
arr = np.array(["apple", "banana", "cherry"], dtype=np.str_)
print(arr.dtype) # Output: <U6 (Unicode string of length 6)>

5. Object Type (Mixed Data)


Data Type Description

🔹
object_ Stores mixed data types
Example:
arr = np.array([1, "hello", 3.14], dtype=np.object_)
print(arr.dtype) # Output: object

6. Changing Data Type (astype)


You can convert an array’s data type using .astype().
arr = np.array([1, 2, 3], dtype=np.int32)
arr_float = arr.astype(np.float64)
print(arr_float.dtype) # Output: float64

Summary of NumPy Data Types


Category Data Types
Integer int8, int16, int32, int64
Floating-Point float16, float32, float64
Boolean bool_
Complex complex64, complex128
Category Data Types
String str_
Mixed object_
Indexing and Slicing in NumPy Arrays
NumPy provides indexing and slicing to access and manipulate elements efficiently in arrays.

1. Indexing in NumPy
NumPy supports zero-based indexing, meaning the first element has an index of 0.

A. Indexing in 1D Arrays
import numpy as np

arr = np.array([10, 20, 30, 40, 50])

print(arr[0]) # Output: 10 (First element)


print(arr[-1]) # Output: 50 (Last element)
print(arr[2]) # Output: 30 (Third element)

B. Indexing in 2D Arrays
For 2D arrays, use [row, column] indexing.
arr_2d = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])

print(arr_2d[0, 1]) # Output: 2 (Element in 1st row, 2nd column)


print(arr_2d[2, -1]) # Output: 9 (Element in last row, last column)

C. Indexing in 3D Arrays
For 3D arrays, use [depth, row, column] indexing.
arr_3d = np.array([
[[1, 2, 3], [4, 5, 6]], # First layer
[[7, 8, 9], [10, 11, 12]] # Second layer
])

print(arr_3d[1, 0, 2]) # Output: 9 (Second layer, first row, third column)

2. Slicing in NumPy
Slicing allows extracting subarrays using the format:
array[start:stop:step]

• start → Starting index (included)


• stop → Ending index (excluded)
• step → Step size (default = 1)

A. Slicing a 1D Array
arr = np.array([10, 20, 30, 40, 50, 60, 70])

print(arr[1:5]) # Output: [20 30 40 50] (Index 1 to 4)


print(arr[:4]) # Output: [10 20 30 40] (Start from beginning)
print(arr[3:]) # Output: [40 50 60 70] (Until the end)
print(arr[::2]) # Output: [10 30 50 70] (Every second element)
print(arr[::-1]) # Output: [70 60 50 40 30 20 10] (Reverse order)

B. Slicing a 2D Array
arr_2d = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]
])

print(arr_2d[1:, 2:]) # Output: [[ 7 8] [11 12]] (Rows 1 onwards, Columns 2


onwards)
print(arr_2d[:, 1:3]) # Output: [[ 2 3] [ 6 7] [10 11]] (All rows, Columns
1 and 2)
print(arr_2d[0, :]) # Output: [1 2 3 4] (First row, all columns)

C. Slicing a 3D Array
arr_3d = np.array([
[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]
])

print(arr_3d[:, 1, :]) # Output: [[ 4 5 6] [10 11 12]] (All layers, second


row, all columns)
print(arr_3d[:, :, 1]) # Output: [[ 2 5] [ 8 11]] (All layers, all rows,
second column)

3. Fancy Indexing (Advanced Indexing)


You can select multiple specific elements using index arrays.
arr = np.array([10, 20, 30, 40, 50])
indices = [0, 2, 4]

print(arr[indices]) # Output: [10 30 50]

For 2D arrays:
arr_2d = np.array([
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
])

rows = [0, 1, 2]
cols = [2, 1, 0]

print(arr_2d[rows, cols]) # Output: [30 50 70] (Picks (0,2), (1,1), (2,0))

4. Boolean Indexing
Retrieve elements based on conditions.
arr = np.array([10, 20, 30, 40, 50])

print(arr[arr > 25]) # Output: [30 40 50]

For 2D arrays:
arr_2d = np.array([
[10, 20, 30],
[40, 50, 60]
])

print(arr_2d[arr_2d > 30]) # Output: [40 50 60]

Summary Table: Indexing & Slicing in NumPy


Operation Syntax Example Output
Single Indexing arr[2] 30 (3rd element)
Negative Indexing arr[-1] 50 (Last element)
1D Slicing arr[1:4] [20 30 40]
2D Slicing arr_2d[:, 1:3] Extracts columns 1-2
Reverse Order arr[::-1] Reverse array
Fancy Indexing arr[[0, 2, 4]] [10 30 50]
Boolean Indexing arr[arr > 25] [30 40 50]
Example Use Case of a 1D NumPy Array in Python
Use Case: Stock Price Analysis
A company wants to analyze the daily closing stock prices over a week to calculate key statistics
like average price, highest price, lowest price, and percentage change.

Step 1: Import NumPy and Create a 1D Array


import numpy as np

# Closing stock prices for a week (in dollars)


stock_prices = np.array([150, 152, 148, 155, 157, 160, 158])

print("Stock Prices:", stock_prices)

🔹 1D NumPy array is used to store the daily closing stock prices.


Step 2: Calculate Key Statistics
# Calculate the average stock price
average_price = np.mean(stock_prices)

# Find the highest and lowest prices


highest_price = np.max(stock_prices)
lowest_price = np.min(stock_prices)

# Calculate percentage change from first to last day


percentage_change = ((stock_prices[-1] - stock_prices[0]) / stock_prices[0]) *
100

print(f"Average Price: ${average_price:.2f}")


print(f"Highest Price: ${highest_price}")
print(f"Lowest Price: ${lowest_price}")
print(f"Percentage Change: {percentage_change:.2f}%")

🔹 NumPy functions like mean(), max(), and min() make statistical calculations easy.
🔹 Vectorized operations enable fast percentage change calculations.
Step 3: Identify Days When Stock Price Was Above the Average
# Find days where stock price was above average
above_avg_days = stock_prices[stock_prices > average_price]

print("Days when stock price was above average:", above_avg_days)

🔹 Boolean indexing allows filtering values efficiently.


Output:
Stock Prices: [150 152 148 155 157 160 158]
Average Price: $154.29
Highest Price: $160
Lowest Price: $148
Percentage Change: 5.33%
Days when stock price was above average: [155 157 160 158]

Complete Python Program: Stock Price Analysis Using a 1D NumPy Array


📌 This program calculates:
• Average stock price over the week
• Highest and lowest stock prices
• Percentage change from the first to the last day
• Days when the stock price was above the average

Python Code
import numpy as np

# Stock prices (Closing prices over a week)


stock_prices = np.array([150, 152, 148, 155, 157, 160, 158])

print("Stock Prices Over the Week:", stock_prices)

# 1. Calculate the average stock price


average_price = np.mean(stock_prices)
print(f"\nAverage Stock Price: ${average_price:.2f}")

# 2. Find the highest and lowest stock prices


highest_price = np.max(stock_prices)
lowest_price = np.min(stock_prices)
print(f"Highest Stock Price: ${highest_price}")
print(f"Lowest Stock Price: ${lowest_price}")

# 3. Calculate the percentage change from the first to the last day
percentage_change = ((stock_prices[-1] - stock_prices[0]) / stock_prices[0]) *
100
print(f"Percentage Change Over the Week: {percentage_change:.2f}%")

# 4. Identify days when stock price was above average


above_avg_days = stock_prices[stock_prices > average_price]
print("Stock Prices Above Average:", above_avg_days)

# 5. Identify the day with the maximum stock price


highest_day = np.argmax(stock_prices) + 1 # Adding 1 for a 1-based index
print(f"Day with Highest Stock Price: Day {highest_day}")

# 6. Identify the day with the minimum stock price


lowest_day = np.argmin(stock_prices) + 1
print(f"Day with Lowest Stock Price: Day {lowest_day}")

Expected Output
Stock Prices Over the Week: [150 152 148 155 157 160 158]
Average Stock Price: $154.29
Highest Stock Price: $160
Lowest Stock Price: $148
Percentage Change Over the Week: 5.33%
Stock Prices Above Average: [155 157 160 158]
Day with Highest Stock Price: Day 6
Day with Lowest Stock Price: Day 3
What is a 3D Array?
A 3D array is a three-dimensional NumPy array, which consists of multiple 2D matrices stacked
together. It is structured as:
(depth, rows, columns) → Like a stack of 2D arrays.
📌 Example of a 3D array:
import numpy as np

# Creating a 3D array (2 matrices of size 3x3)


array_3d = np.array([
[[1, 2, 3], [4, 5, 6], [7, 8, 9]], # First 3x3 matrix
[[10, 11, 12], [13, 14, 15], [16, 17, 18]] # Second 3x3 matrix
])

print(array_3d.shape) # Output: (2, 3, 3)


print(array_3d)

Structure of array_3d
[
[[ 1 2 3] # First matrix
[ 4 5 6]
[ 7 8 9]],

[[10 11 12] # Second matrix


[13 14 15]
[16 17 18]]
]

Here, 2 layers (depth), each with 3 rows and 3 columns.

Uses of a 3D Array
1. Image Processing (RGB Images)
📌 In computer vision, images are stored as 3D arrays (Height × Width × Channels).
image = np.zeros((256, 256, 3)) # A blank 256x256 RGB image

✅ Used in: OpenCV, TensorFlow, and deep learning for image classification.
2. Medical Imaging (MRI, CT Scans)
📌 3D arrays store multiple slices of scans in medical imaging.
mri_scan = np.zeros((100, 256, 256)) # 100 slices of 256x256 resolution

✅ Used in: MRI, CT scans, and 3D reconstructions.


3. Weather Data Analysis
📌 Stores temperature data across multiple locations and times.
weather_data = np.random.rand(12, 7, 24) # 12 months, 7 days, 24-hour readings

✅ Used in: Climate modeling and meteorology.


4. Video Processing (Frame Sequences)
📌 A video is a sequence of images (frames), stored as a 3D array.
video_frames = np.zeros((60, 1080, 1920)) # 60 frames, 1080p resolution

✅ Used in: Video compression, object detection, and AI applications.


5. Deep Learning (Batch Processing)
📌 Neural networks process multiple images at once using 3D arrays.
batch_images = np.zeros((32, 224, 224, 3)) # 32 images of 224x224 resolution

✅ Used in: CNNs, facial recognition, and robotics.

Working with 3D Arrays in NumPy


A 3D array in NumPy is an array with three dimensions:
• Depth (Layers) – Multiple 2D matrices stacked together
• Rows – The number of rows in each 2D matrix
• Columns – The number of columns in each 2D matrix

1. Creating a 3D NumPy Array


You can create a 3D array using np.array() by stacking multiple 2D arrays.
import numpy as np

# Creating a 3D array (2 layers, each with 3 rows and 4 columns)


array_3d = np.array([
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], # First 3x4 matrix
[[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]] # Second 3x4 matrix
])

print("3D Array:\n", array_3d)


print("\nShape of the array:", array_3d.shape) # Output: (2, 3, 4)

🔹 Shape Explanation: (2, 3, 4) means 2 layers, 3 rows, and 4 columns.


2. Accessing Elements in a 3D Array
You can access elements using indexing:
Syntax: array_3d[layer, row, column]
print(array_3d[0, 1, 2]) # Access element from 1st layer, 2nd row, 3rd column →
Output: 7
print(array_3d[1, 2, 3]) # Access element from 2nd layer, 3rd row, 4th column →
Output: 24

3. Slicing in a 3D Array
Slicing allows you to extract subarrays.

Extract an entire layer (2D Matrix)


print(array_3d[0]) # Extracts the first layer

Extract a specific column from all layers


print(array_3d[:, :, 2]) # Extracts the third column from all layers

Extract a subarray
print(array_3d[:, 1:, 2:]) # Extracts last two rows & last two columns from
both layers

4. Basic Operations on 3D Arrays


NumPy allows operations across dimensions.

1. Summing Elements
print(np.sum(array_3d)) # Sum of all elements in the array
print(np.sum(array_3d, axis=0)) # Sum across layers (depth)
print(np.sum(array_3d, axis=1)) # Sum across rows
print(np.sum(array_3d, axis=2)) # Sum across columns

2. Finding Maximum and Minimum


print(np.max(array_3d)) # Maximum value in the array
print(np.min(array_3d, axis=1)) # Minimum values for each layer

3. Mean and Standard Deviation


print(np.mean(array_3d, axis=0)) # Mean across layers
print(np.std(array_3d)) # Standard deviation of the whole array
5. Reshaping and Transposing a 3D Array
Reshape a 3D Array
reshaped_array = array_3d.reshape(4, 2, 3) # Change shape while maintaining
total elements
print(reshaped_array.shape) # Output: (4, 2, 3)

Transpose a 3D Array
transposed_array = array_3d.transpose(1, 0, 2) # Swaps rows and layers
print(transposed_array.shape) # Output: (3, 2, 4)

6. Use Case Example: Analyzing Weather Data


📌 Scenario: A city records daily temperatures over 7 days for 3 different locations (North, South,
and East).
We store the data in a 3D NumPy array and perform an analysis.
# Creating a 3D array (7 days, 3 locations, 2 readings per day)
temperature_data = np.array([
[[30, 32], [28, 29], [25, 27]], # Day 1
[[31, 33], [27, 28], [26, 27]], # Day 2
[[29, 31], [26, 28], [24, 26]], # Day 3
[[32, 34], [29, 30], [27, 29]], # Day 4
[[30, 32], [28, 29], [25, 27]], # Day 5
[[31, 33], [27, 28], [26, 27]], # Day 6
[[29, 31], [26, 28], [24, 26]] # Day 7
])

# Compute average temperature per location over 7 days


avg_temp_per_location = np.mean(temperature_data, axis=0)
print("\nAverage Temperature Per Location:\n", avg_temp_per_location)

# Compute maximum temperature recorded


max_temp = np.max(temperature_data)
print("\nMaximum Temperature Recorded:", max_temp)

# Compute temperature variation (Standard Deviation)


temp_variation = np.std(temperature_data, axis=0)
print("\nTemperature Variation Per Location:\n", temp_variation)
Use Case Example: 3D NumPy Array in Data Analysis
📌 Scenario: Sales Data Analysis
A company tracks daily sales data for 3 products across 4 stores over 7 days. We store and
analyze this data using a 3D NumPy array.

Python Program for demo of 3D array: Sales Data Analysis


import numpy as np

# Create a 3D NumPy array for sales data


# Shape: (7 days, 4 stores, 3 products)
sales_data = np.array([
[[200, 150, 100], [220, 160, 90], [210, 140, 110], [230, 170, 120]], # Day
1
[[180, 130, 120], [190, 150, 100], [200, 120, 130], [210, 160, 140]], # Day
2
[[250, 180, 130], [240, 190, 140], [260, 200, 150], [270, 210, 160]], # Day
3
[[300, 200, 150], [310, 210, 160], [320, 220, 170], [330, 230, 180]], # Day
4
[[280, 190, 140], [270, 180, 130], [260, 170, 120], [250, 160, 110]], # Day
5
[[220, 170, 130], [230, 180, 140], [240, 190, 150], [250, 200, 160]], # Day
6
[[210, 160, 120], [220, 170, 130], [230, 180, 140], [240, 190, 150]] # Day
7
])

# 1. Compute total sales for each product across all stores and days
total_sales_per_product = np.sum(sales_data, axis=(0, 1))
print("\nTotal Sales for Each Product:", total_sales_per_product)

# 2. Compute average daily sales for each store


average_sales_per_store = np.mean(sales_data, axis=0)
print("\nAverage Daily Sales per Store:\n", average_sales_per_store)

# 3. Find the best-selling product


best_selling_product = np.argmax(total_sales_per_product) + 1
print(f"\nBest Selling Product: Product {best_selling_product}")

# 4. Find the store with the highest total sales


total_sales_per_store = np.sum(sales_data, axis=(0, 2))
best_store = np.argmax(total_sales_per_store) + 1
print(f"\nBest Performing Store: Store {best_store}")

# 5. Compute total sales for each day


daily_sales = np.sum(sales_data, axis=(1, 2))
print("\nTotal Sales per Day:", daily_sales)

Expected Output (Example)


Total Sales for Each Product: [10500 8400 7000]

Average Daily Sales per Store:


[[234.29 168.57 121.43]
[240.00 177.14 127.14]
[245.71 180.00 134.29]
[254.29 191.43 145.71]]

Best Selling Product: Product 1

Best Performing Store: Store 4

Total Sales per Day: [1490 1410 1820 2090 1620 1710 1630]
Use Case of a 2D NumPy Array in Python
Use Case: Student Grade Analysis
📌 Scenario:
A university records students' scores in 3 subjects (Math, Science, and English) for 5 students.
Using a 2D NumPy array, we will analyze the data by calculating average scores, highest scores,
lowest scores, and subject-wise performance.

Step 1: Import NumPy and Create a 2D Array


import numpy as np

# Rows represent students, columns represent subjects (Math, Science, English)


grades = np.array([
[85, 90, 78], # Student 1
[88, 76, 92], # Student 2
[90, 88, 85], # Student 3
[70, 65, 80], # Student 4
[95, 98, 95] # Student 5
])

print("Student Grades (Rows: Students, Columns: Subjects):\n", grades)

🔹 2D Array Structure: Each row represents a student, and each column represents a subject.
Step 2: Compute Key Statistics
1. Calculate the Average Score for Each Student
average_per_student = np.mean(grades, axis=1)
print("Average Score per Student:", average_per_student)

📌 axis=1 → Computes the mean row-wise (for each student).


2. Calculate the Average Score for Each Subject
average_per_subject = np.mean(grades, axis=0)
print("Average Score per Subject (Math, Science, English):",
average_per_subject)

📌 axis=0 → Computes the mean column-wise (for each subject).


3. Find the Highest and Lowest Scores in Each Subject
highest_per_subject = np.max(grades, axis=0)
lowest_per_subject = np.min(grades, axis=0)
print("Highest Scores per Subject (Math, Science, English):",
highest_per_subject)
print("Lowest Scores per Subject (Math, Science, English):", lowest_per_subject)

📌 max() and min() functions help identify top and bottom-performing students in each
subject.

4. Find the Best Performing Student (Highest Total Score)


total_scores = np.sum(grades, axis=1)
best_student_index = np.argmax(total_scores) # Finds index of the highest total
score
print(f"Best Performing Student: Student {best_student_index + 1} with Total
Score {total_scores[best_student_index]}")

📌 sum(axis=1) calculates total scores for each student, and argmax() finds the highest
scorer.

Output:
Student Grades (Rows: Students, Columns: Subjects):
[[85 90 78]
[88 76 92]
[90 88 85]
[70 65 80]
[95 98 95]]

Average Score per Student: [84.33 85.33 87.67 71.67 96.00]


Average Score per Subject (Math, Science, English): [85.6 83.4 86.0]
Highest Scores per Subject (Math, Science, English): [95 98 95]
Lowest Scores per Subject (Math, Science, English): [70 65 78]
Best Performing Student: Student 5 with Total Score 288

Complete Program

Complete Python Program: Student Grade Analysis Using a 2D NumPy Array


📌 This program calculates:
• Average scores per student
• Average scores per subject
• Highest & lowest scores in each subject
• Best performing student

Python Code
import numpy as np

# Define a 2D NumPy array for student grades


# Rows represent students, Columns represent subjects (Math, Science, English)
grades = np.array([
[85, 90, 78], # Student 1
[88, 76, 92], # Student 2
[90, 88, 85], # Student 3
[70, 65, 80], # Student 4
[95, 98, 95] # Student 5
])

print("Student Grades (Rows: Students, Columns: Subjects):\n", grades)

# 1. Calculate the average score for each student


average_per_student = np.mean(grades, axis=1)
print("\nAverage Score per Student:", average_per_student)

# 2. Calculate the average score for each subject


average_per_subject = np.mean(grades, axis=0)
print("\nAverage Score per Subject (Math, Science, English):",
average_per_subject)

# 3. Find the highest and lowest scores in each subject


highest_per_subject = np.max(grades, axis=0)
lowest_per_subject = np.min(grades, axis=0)
print("\nHighest Scores per Subject (Math, Science, English):",
highest_per_subject)
print("Lowest Scores per Subject (Math, Science, English):", lowest_per_subject)

# 4. Identify the best-performing student (highest total score)


total_scores = np.sum(grades, axis=1)
best_student_index = np.argmax(total_scores) # Index of the best student
print(f"\nBest Performing Student: Student {best_student_index + 1} with Total
Score {total_scores[best_student_index]}")

# 5. Identify students scoring above the class average


class_avg = np.mean(grades)
students_above_avg = np.where(np.mean(grades, axis=1) > class_avg)[0] + 1 #
Adding 1 to match student numbering
print("\nStudents Scoring Above Class Average:", students_above_avg)

Expected Output
Student Grades (Rows: Students, Columns: Subjects):
[[85 90 78]
[88 76 92]
[90 88 85]
[70 65 80]
[95 98 95]]

Average Score per Student: [84.33 85.33 87.67 71.67 96.00]

Average Score per Subject (Math, Science, English): [85.6 83.4 86.0]

Highest Scores per Subject (Math, Science, English): [95 98 95]


Lowest Scores per Subject (Math, Science, English): [70 65 78]

Best Performing Student: Student 5 with Total Score 288

Students Scoring Above Class Average: [2 3 5]


NumPy and Working with Files
NumPy provides efficient methods to read from and write to files in different formats, including
binary (.npy, .npz) and text (.txt, .csv). These operations are crucial for data storage,
analysis, and machine learning workflows.

1. Working with Binary Files (.npy and .npz)


Binary formats are faster and more efficient for storing large datasets compared to text files.

A. Saving and Loading a Single NumPy Array (.npy Format)


Saving an Array
import numpy as np

# Creating an array
arr = np.array([10, 20, 30, 40, 50])

# Saving as a binary file


np.save('data.npy', arr)

print("Array saved successfully!")

Loading an Array
loaded_arr = np.load('data.npy')
print(loaded_arr) # Output: [10 20 30 40 50]

B. Saving and Loading Multiple Arrays (.npz Format)


Saving Multiple Arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([[4, 5, 6], [7, 8, 9]])

# Save multiple arrays in a compressed file


np.savez('multiple_data.npz', first=arr1, second=arr2)

print("Multiple arrays saved!")

Loading Multiple Arrays


loaded = np.load('multiple_data.npz')

print(loaded['first']) # Output: [1 2 3]
print(loaded['second']) # Output: [[4 5 6] [7 8 9]]
2. Working with Text Files (.txt)
Text files are human-readable and commonly used for structured datasets.

A. Saving a NumPy Array to a Text File


arr = np.array([[1, 2, 3], [4, 5, 6]])

# Save as text file (space-separated)


np.savetxt('data.txt', arr, fmt='%d')

print("Text file saved!")

B. Loading a NumPy Array from a Text File


loaded_txt = np.loadtxt('data.txt', dtype=int)
print(loaded_txt)

3. Working with CSV Files (.csv)


CSV files are widely used in data science and analytics.

A. Saving a NumPy Array as a CSV File


arr = np.array([[1, 2, 3], [4, 5, 6]])

# Save as CSV (comma-separated)


np.savetxt('data.csv', arr, delimiter=',', fmt='%d')

print("CSV file saved!")

B. Loading a NumPy Array from a CSV File


loaded_csv = np.loadtxt('data.csv', delimiter=',', dtype=int)
print(loaded_csv)

4. Handling Missing Data in CSV Files


Use np.genfromtxt() when dealing with missing values in CSV files.
data = np.genfromtxt('data.csv', delimiter=',', dtype=int, filling_values=0)
print(data)

5. Reading and Writing Large Files Efficiently


For large datasets, use memory-mapped files (memmap) to handle data without loading it all into
memory.
arr = np.memmap('large_data.dat', dtype='float32', mode='w+', shape=(10000,
10000))
arr[:] = np.random.rand(10000, 10000) # Write data
del arr # Flush changes

# Read data
arr_loaded = np.memmap('large_data.dat', dtype='float32', mode='r',
shape=(10000, 10000))
print(arr_loaded[0, 0]) # Access without loading the entire file

Summary Table
File Format Save Function Load Function
np.save('file.npy',
.npy (Binary) np.load('file.npy')
arr)
.npz (Multiple np.savez('file.npz', data = np.load('file.npz');
Arrays) a=arr1, b=arr2) data['a']
np.savetxt('file.txt', np.loadtxt('file.txt',
.txt (Text File)
arr, fmt='%d') dtype=int)
np.savetxt('file.csv',
.csv (Comma- np.loadtxt('file.csv',
arr, delimiter=',',
separated) delimiter=',', dtype=int)
fmt='%d')
np.genfromtxt('file.csv',
.csv (with missing
- delimiter=',',
values) filling_values=0)
.dat (Memory- np.memmap('file.dat', np.memmap('file.dat',
mapped) dtype, mode, shape) dtype, mode, shape)
1. Reading a Large CSV File for Stock Market Analysis
📌 Use Case: Import stock prices and analyze them.
Example:
import numpy as np

# Save a sample stock data CSV file


data = """Date,Open,High,Low,Close
2024-02-01,150,155,148,152
2024-02-02,152,158,151,157
2024-02-03,157,160,155,159"""
with open("stocks.csv", "w") as f:
f.write(data)

# Read CSV file, skipping the header row


stock_data = np.loadtxt("stocks.csv", delimiter=',', skiprows=1, dtype=float,
usecols=(1, 2, 3, 4))

# Calculate average closing price


average_close = np.mean(stock_data[:, 3])
print("Stock Data:\n", stock_data)
print("Average Closing Price:", average_close)

✔ Key Insights:
• Reads a stock price dataset from a CSV file.
• Extracts Open, High, Low, Close prices.
• Computes the average closing price.

2. Reading IoT Sensor Data from a Text File


📌 Use Case: Analyze temperature sensor readings.
Example:
# Save a sample temperature dataset
np.savetxt("temperature_data.txt", np.random.uniform(20, 30, 10), fmt="%.2f")

# Read temperature data from the text file


temperature_data = np.loadtxt("temperature_data.txt", dtype=float)

# Compute max and min temperature


max_temp = np.max(temperature_data)
min_temp = np.min(temperature_data)

print("Temperature Data:", temperature_data)


print("Max Temperature:", max_temp)
print("Min Temperature:", min_temp)

✔ Key Insights:
• Reads temperature sensor data stored in a .txt file.
• Finds the maximum and minimum recorded temperature.
3. Loading Image Data for Machine Learning (Binary .npy)
📌 Use Case: Load preprocessed image data from a NumPy binary file.
Example:
# Create a random image array (100x100 pixels, grayscale)
image = np.random.randint(0, 255, (100, 100), dtype=np.uint8)

# Save as a binary NumPy file


np.save("image_data.npy", image)

# Load the image data


loaded_image = np.load("image_data.npy")

print("Image Shape:", loaded_image.shape)


print("First 5 pixels:", loaded_image[0, :5])

✔ Key Insights:
• Saves a random 100x100 grayscale image to a .npy file.
• Loads the image back into a NumPy array.

4. Handling Missing Data in CSV Files


📌 Use Case: Load a dataset with missing values and fill them.
Example:
# Save a sample CSV file with missing values
data = """10,20,30
40,,60
70,80,90"""
with open("data_with_missing.csv", "w") as f:
f.write(data)

# Load the file, replacing missing values with 0


data = np.genfromtxt("data_with_missing.csv", delimiter=",", filling_values=0)

print("Data with Missing Values Replaced:\n", data)

✔ Key Insights:
• Uses np.genfromtxt() to handle missing values.
• Replaces missing values with 0.

5. Reading Large Files Efficiently Using Memory Mapping


📌 Use Case: Load large datasets without loading them fully into memory.
Example:
# Create a large random dataset and save it
large_data = np.random.rand(10000, 10000).astype('float32')
large_data.tofile("large_dataset.dat")

# Read the file using memory mapping


big_data = np.memmap("large_dataset.dat", dtype="float32", mode="r",
shape=(10000, 10000))

# Access a small portion without loading everything


print("First row first 5 elements:", big_data[0, :5])

✔ Key Insights:
• Uses np.memmap() for big data processing.
• Avoids loading the full dataset into RAM.

Summary of Use Cases


Use Case File Format Reading Method
Stock Market Analysis .csv np.loadtxt()
IoT Sensor Data Analysis .txt np.loadtxt()
Image Processing (ML/DL) .npy np.load()
Handling Missing Values .csv np.genfromtxt()
Big Data Processing .dat np.memmap()

You might also like