
- NumPy - Home
- NumPy - Introduction
- NumPy - Environment
- NumPy Arrays
- NumPy - Ndarray Object
- NumPy - Data Types
- NumPy Creating and Manipulating Arrays
- NumPy - Array Creation Routines
- NumPy - Array Manipulation
- NumPy - Array from Existing Data
- NumPy - Array From Numerical Ranges
- NumPy - Iterating Over Array
- NumPy - Reshaping Arrays
- NumPy - Concatenating Arrays
- NumPy - Stacking Arrays
- NumPy - Splitting Arrays
- NumPy - Flattening Arrays
- NumPy - Transposing Arrays
- NumPy Indexing & Slicing
- NumPy - Indexing & Slicing
- NumPy - Indexing
- NumPy - Slicing
- NumPy - Advanced Indexing
- NumPy - Fancy Indexing
- NumPy - Field Access
- NumPy - Slicing with Boolean Arrays
- NumPy Array Attributes & Operations
- NumPy - Array Attributes
- NumPy - Array Shape
- NumPy - Array Size
- NumPy - Array Strides
- NumPy - Array Itemsize
- NumPy - Broadcasting
- NumPy - Arithmetic Operations
- NumPy - Array Addition
- NumPy - Array Subtraction
- NumPy - Array Multiplication
- NumPy - Array Division
- NumPy Advanced Array Operations
- NumPy - Swapping Axes of Arrays
- NumPy - Byte Swapping
- NumPy - Copies & Views
- NumPy - Element-wise Array Comparisons
- NumPy - Filtering Arrays
- NumPy - Joining Arrays
- NumPy - Sort, Search & Counting Functions
- NumPy - Searching Arrays
- NumPy - Union of Arrays
- NumPy - Finding Unique Rows
- NumPy - Creating Datetime Arrays
- NumPy - Binary Operators
- NumPy - String Functions
- NumPy - Matrix Library
- NumPy - Linear Algebra
- NumPy - Matplotlib
- NumPy - Histogram Using Matplotlib
- NumPy Sorting and Advanced Manipulation
- NumPy - Sorting Arrays
- NumPy - Sorting along an axis
- NumPy - Sorting with Fancy Indexing
- NumPy - Structured Arrays
- NumPy - Creating Structured Arrays
- NumPy - Manipulating Structured Arrays
- NumPy - Record Arrays
- Numpy - Loading Arrays
- Numpy - Saving Arrays
- NumPy - Append Values to an Array
- NumPy - Swap Columns of Array
- NumPy - Insert Axes to an Array
- NumPy Handling Missing Data
- NumPy - Handling Missing Data
- NumPy - Identifying Missing Values
- NumPy - Removing Missing Data
- NumPy - Imputing Missing Data
- NumPy Performance Optimization
- NumPy - Performance Optimization with Arrays
- NumPy - Vectorization with Arrays
- NumPy - Memory Layout of Arrays
- Numpy Linear Algebra
- NumPy - Linear Algebra
- NumPy - Matrix Library
- NumPy - Matrix Addition
- NumPy - Matrix Subtraction
- NumPy - Matrix Multiplication
- NumPy - Element-wise Matrix Operations
- NumPy - Dot Product
- NumPy - Matrix Inversion
- NumPy - Determinant Calculation
- NumPy - Eigenvalues
- NumPy - Eigenvectors
- NumPy - Singular Value Decomposition
- NumPy - Solving Linear Equations
- NumPy - Matrix Norms
- NumPy Element-wise Matrix Operations
- NumPy - Sum
- NumPy - Mean
- NumPy - Median
- NumPy - Min
- NumPy - Max
- NumPy Set Operations
- NumPy - Unique Elements
- NumPy - Intersection
- NumPy - Union
- NumPy - Difference
- NumPy Random Number Generation
- NumPy - Random Generator
- NumPy - Permutations & Shuffling
- NumPy - Uniform distribution
- NumPy - Normal distribution
- NumPy - Binomial distribution
- NumPy - Poisson distribution
- NumPy - Exponential distribution
- NumPy - Rayleigh Distribution
- NumPy - Logistic Distribution
- NumPy - Pareto Distribution
- NumPy - Visualize Distributions With Sea born
- NumPy - Matplotlib
- NumPy - Multinomial Distribution
- NumPy - Chi Square Distribution
- NumPy - Zipf Distribution
- NumPy File Input & Output
- NumPy - I/O with NumPy
- NumPy - Reading Data from Files
- NumPy - Writing Data to Files
- NumPy - File Formats Supported
- NumPy Mathematical Functions
- NumPy - Mathematical Functions
- NumPy - Trigonometric functions
- NumPy - Exponential Functions
- NumPy - Logarithmic Functions
- NumPy - Hyperbolic functions
- NumPy - Rounding functions
- NumPy Fourier Transforms
- NumPy - Discrete Fourier Transform (DFT)
- NumPy - Fast Fourier Transform (FFT)
- NumPy - Inverse Fourier Transform
- NumPy - Fourier Series and Transforms
- NumPy - Signal Processing Applications
- NumPy - Convolution
- NumPy Polynomials
- NumPy - Polynomial Representation
- NumPy - Polynomial Operations
- NumPy - Finding Roots of Polynomials
- NumPy - Evaluating Polynomials
- NumPy Statistics
- NumPy - Statistical Functions
- NumPy - Descriptive Statistics
- NumPy Datetime
- NumPy - Basics of Date and Time
- NumPy - Representing Date & Time
- NumPy - Date & Time Arithmetic
- NumPy - Indexing with Datetime
- NumPy - Time Zone Handling
- NumPy - Time Series Analysis
- NumPy - Working with Time Deltas
- NumPy - Handling Leap Seconds
- NumPy - Vectorized Operations with Datetimes
- NumPy ufunc
- NumPy - ufunc Introduction
- NumPy - Creating Universal Functions (ufunc)
- NumPy - Arithmetic Universal Function (ufunc)
- NumPy - Rounding Decimal ufunc
- NumPy - Logarithmic Universal Function (ufunc)
- NumPy - Summation Universal Function (ufunc)
- NumPy - Product Universal Function (ufunc)
- NumPy - Difference Universal Function (ufunc)
- NumPy - Finding LCM with ufunc
- NumPy - ufunc Finding GCD
- NumPy - ufunc Trigonometric
- NumPy - Hyperbolic ufunc
- NumPy - Set Operations ufunc
- NumPy Useful Resources
- NumPy - Quick Guide
- NumPy - Cheatsheet
- NumPy - Useful Resources
- NumPy - Discussion
- NumPy Compiler
NumPy - Imputing Missing Data
Imputing Missing Data in Arrays
Imputing missing data in arrays involves filling in the missing values with estimated or calculated values based on the available data. This process helps in the following ways −
- Preserve Data: Avoids loss of information that might be important for analysis.
- Improve Analysis: Ensures complete datasets, which can lead to more accurate analyses.
- Handle Missing Data: Addresses gaps in data that could distort results if left unhandled.
Imputing Missing Data with Mean
Imputing Missing Data with Mean is a technique used to fill in missing values in a dataset by replacing them with the mean (average) value of the available data.The mean value, often referred to as the average, is a measure of central tendency that summarizes a set of numbers by finding their central value.
It is calculated by adding together all the numbers in a dataset and then dividing the sum by the count of those numbers.
Example
In the following example, we calculate the mean of non-NaN values in an array and then use this mean to replace NaN values −
import numpy as np # Creating an array with NaN values arr = np.array([1.0, 2.5, np.nan, 4.7, np.nan, 6.2]) # Calculating the mean of non-NaN values mean_value = np.nanmean(arr) # Imputing NaN values with the mean imputed_arr = np.where(np.isnan(arr), mean_value, arr) print("Original Array:\n", arr) print("Mean Value:", mean_value) print("Imputed Array:\n", imputed_arr)
Following is the output obtained −
Original Array:[1. 2.5 nan 4.7 nan 6.2] Mean Value: 3.5999999999999996 Imputed Array:[1. 2.5 3.6 4.7 3.6 6.2]
Imputing Missing Data with Median
Imputing Missing Data with Median is a technique used to fill in missing values in a dataset by replacing them with the median value of the available data.
The median is the middle value in a dataset when it is ordered, or the average of the two middle values if the dataset has an even number of observations.
Example
In this example, we are calculating the median of non-NaN values in an array and then using this median to replace NaN values −
import numpy as np # Creating an array with NaN values arr = np.array([1.0, 2.5, np.nan, 4.7, np.nan, 6.2]) # Calculating the median of non-NaN values median_value = np.nanmedian(arr) # Imputing NaN values with the median imputed_arr = np.where(np.isnan(arr), median_value, arr) print("Original Array:\n", arr) print("Median Value:", median_value) print("Imputed Array:\n", imputed_arr)
This will produce the following result −
Original Array: [1. 2.5 nan 4.7 nan 6.2] Median Value: 3.6 Imputed Array: [1. 2.5 3.6 4.7 3.6 6.2]
Imputing Missing Data with a Constant
Imputing Missing Data with a Constant is a technique used to fill in missing values in a dataset by replacing them with a predefined constant value.
A constant value refers to a fixed, unchanging number or value that remains the same throughout a particular context or operation.
Example
In the example below, we define a constant value for imputation and replace NaN values in an array with this constant −
import numpy as np # Creating an array with NaN values arr = np.array([1.0, 2.5, np.nan, 4.7, np.nan, 6.2]) # Defining a constant value for imputation constant_value = 0 # Imputing NaN values with the constant imputed_arr = np.where(np.isnan(arr), constant_value, arr) print("Original Array:\n", arr) print("Constant Value:", constant_value) print("Imputed Array:\n", imputed_arr)
Following is the output of the above code −
Original Array: [1. 2.5 nan 4.7 nan 6.2] Constant Value: 0 Imputed Array:[1. 2.5 0. 4.7 0. 6.2]
Imputing Missing Data in Multi-dimensional Arrays
Imputing Missing Data in Multi-dimensional Arrays involves filling in missing values within arrays that have more than one dimension, such as 2D matrices or higher-dimensional arrays.
Example: Imputing Missing Data in a 2D Array
In the following example, we calculate the mean of each column in a 2D array while ignoring NaN values. We then replace the NaN values with the mean of their respective columns −
import numpy as np # Creating a 2D array with NaN values arr_2d = np.array([[1.0, np.nan, 3.5], [np.nan, 5.1, 6.3], [7.2, 8.1, np.nan]]) # Imputing NaN values with the mean of each column column_means = np.nanmean(arr_2d, axis=0) inds = np.where(np.isnan(arr_2d)) # Replace NaN values with the mean of the respective column arr_2d[inds] = np.take(column_means, inds[1]) print("Original 2D Array:\n", arr_2d) print("Column Means:", column_means) print("Imputed 2D Array:\n", arr_2d)
The output obtained is as shown below −
Original 2D Array: [[1. 6.6 3.5] [4.1 5.1 6.3] [7.2 8.1 4.9]] Column Means: [4.1 6.6 4.9] Imputed 2D Array: [[1. 6.6 3.5] [4.1 5.1 6.3] [7.2 8.1 4.9]]
Example: Imputing Missing Data in a 3D Array
Here, we are calculating the median value for each column across all slices of a 3D array while ignoring NaNs. We then replace NaN values with the corresponding median value for each column −
import numpy as np # Create a 3D array with some NaN values arr_3d = np.array([[[1.0, 2.0, np.nan], [np.nan, 5.0, 6.0], [7.0, np.nan, 9.0]], [[np.nan, 2.0, 3.0], [4.0, np.nan, np.nan], [7.0, 8.0, np.nan]]]) # Calculate the median of each slice along the last axis, ignoring NaN values median_value = np.nanmedian(arr_3d, axis=(0, 1)) # Find indices where NaN values are present nan_indices = np.isnan(arr_3d) # Replace NaN values with the median value of the corresponding slice for i in range(arr_3d.shape[2]): # Iterate over the third dimension arr_3d[:, :, i][nan_indices[:, :, i]] = median_value[i] print("3D array after median imputation:") print(arr_3d)
After executing the above code, we get the following output −
3D array after median imputation: [[[1. 2. 6. ] [5.5 5. 6. ] [7. 3.5 9. ]] [[5.5 2. 3. ] [4. 3.5 6. ] [7. 8. 6. ]]]
Imputing with Linear Interpolation
Imputing missing data using linear interpolation involves estimating the missing values based on the values that surround them. This technique is useful for data that is sequential or spatial, where the missing values can be inferred by the values that precede and follow them.
- Linear interpolation is a method of estimating unknown values that fall between known values.
- In one-dimensional data, it involves drawing a straight line between two known points and using this line to estimate the value at a point in between.
- For multi-dimensional data, linear interpolation can extend this concept to higher dimensions.
Example
In the example below, we use linear interpolation to fill missing values (NaNs) in a 1D array. We achieve this by estimating NaN values based on the surrounding non-NaN values −
import numpy as np from scipy import interpolate # Creating an array with NaN values arr = np.array([1.0, np.nan, 3.5, np.nan, 5.0]) # Interpolating missing values nans, x = np.isnan(arr), lambda z: z.nonzero()[0] arr[nans] = np.interp(x(nans), x(~nans), arr[~nans]) print("Original Array:\n", arr) print("Array with Interpolated Values:\n", arr)
We get the output as shown below −
Original Array: [1. 2.25 3.5 4.25 5. ] Array with Interpolated Values: [1. 2.25 3.5 4.25 5. ]