
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Pairwise Distances of N-Dimensional Space Array in Python
Pairwise distance calculation is used in various domains including data analysis, machine learning and image processing. We can calculate the pairwise distance between every pair of elements in each dataset. In this article we will get to know about various methods to calculate pairwise distances in python for arrays representing data in multiple dimensions. We will also get to know about the pdist function available in the SciPy library.
Pairwise Distance
One thing to remember: While calculating the pairwise distance in n-dimensional space we have to find the distance between each pair of points. You can choose any distance metrics according to the type of data and the specification of the problem you want to solve.
Some of the commonly used distance metrics include
Euclidean distance It is used to measure the straight-line distance.
Manhattan distance It is used to measure the sum of absolute differences along each dimension.
Minkowski distance It is used to generalize both Euclidean and Manhattan distances.
These metrics help us identify the dissimilarity or similarity between data points in different ways based on the problem at hand.
Let's see some of the methods to calculate pairwise distances.
Method 1: Manual Calculation
We can manually calculate pairwise distances by implementing the distance calculation formula. see Here by using an example of two points (x1, y1) and (x2, y2). we can calculate Euclidean distance between these points using the following formula
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
We can apply this formula for every pair of point to calculate the pairwise distances. Using this approach can become computationally expensive and time-consuming for large datasets and higher-dimensional arrays.
Method 2: NumPy and SciPy Libraries
This method utilizes the features NumPy and SciPy libraries. These libraries are popular and efficient tools for scientific calculation in Python language. These libraries offer optimized functions that can calculate pairwise distances effectively, saving time and simplifying the process.
To calculate NumPy and SciPy for pairwise distance, we start by converting our array representing the data in multiple dimensions into a matrix format. This can be achieved by using the NumPy array function, which creates a matrix from our dataset. Subsequently, we can leverage the cdist function provided by SciPy to compute the pairwise distances.
Example
import numpy as np from scipy.spatial.distance import cdist pts = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) dist = cdist(pts, pts, metric='euclidean') print(dist)
Output
[[ 0. 5.19615242 10.39230485] [ 5.19615242 0. 5.19615242] [10.39230485 5.19615242 0. ]]
In the above example, we create 3-dimensional array pts and use the cdist function which is used to calculate the pairwise distance using the Euclidean distance metrics. In the resultant array it will have the distances between each pair of points in the pts array.
Method 3: Scikit-learn Library
Scikit-learn is a Python library used for the purpose of machine learning work. It offers a verity of functionalities for data analysis and modeling. One of its useful features is the ability to calculate pairwise distances effortlessly.
Scikit-learn is a Python library used in machine learning work. It offers a broad range of functionalities for data analysis and modeling. One of its useful features is the ability to calculate pairwise distances effortlessly.
For instance, if we want to calculate pairwise distances using the Manhattan distance metric, we can utilize scikit-learn's pairwise_distances function. It will do the computation work to save time as well as effort.
Example
from sklearn.metrics import pairwise_distances pts = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] dis = pairwise_distances(pts, metric='manhattan') print(dis)
Output
[[ 0. 9. 18.] [ 9. 0. 9.] [18. 9. 0.]]
Here in the example, we took a 3-dimensional array named 'pts' and to calculate the pairwise distance we will use pairwise_distances function. The resultant array will contain the distance between every pair of points
Method 4: Scipy.spatial.distance Module
In this method we will use scipy.spatial.distance which provides various types of distance matrix for pairwise distance calculation.
Example
from scipy.spatial.distance import cdist pts = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] dis = cdist(pts, pts) print(dis)
Output
[[ 0. 5.19615242 10.39230485] [ 5.19615242 0. 5.19615242] [10.39230485 5.19615242 0. ]]
Method 5: Using NearestNeighbors Class
In this method we can use NearestNeighbors class from the skit learn library. We can use this class to find the nearest neighbors as well as find the distances between the points.
Example
from sklearn.neighbors import NearestNeighbors pts = [[1, 2], [4, 5], [7, 8]] nbrs = NearestNeighbors(n_neighbors=len(pts)).fit(pts) dis, _ = nbrs.kneighbors(pts) print(dis)
Output
[[0. 4.24264069 8.48528137] [0. 4.24264069 4.24264069] [0. 4.24264069 8.48528137]]
Explanation
In the above program we have created the instances of class NearestNeighbors and using the kneighbors method we will find the distances between every point and its nearest neighbors. Here in the output the first point 0 represents the distance between point 1,2 to itself. Whereas the second element represents the distance between (1,2) to (4,5).
So, these were some methods to calculate the pairwise distance of n-dimension array using python language. You can implement any of the method in your program which you find optimal or comfortable working with to calculate the pairwise distance.