SciPy is a Python library useful for solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high-level functions will significantly reduce the complexity of the code and helps better in analyzing the data.
1. What is SciPy?
SciPy is an interactive Python session used as a data-processing library that is made to compete with its rivalries such as MATLAB, Octave, R-Lab, etc. It has many user-friendly, efficient and easy-to-use functions that help to solve problems like numerical integration, interpolation, optimization, linear algebra and statistics. The benefit of using the SciPy library in Python while making ML models is that it makes a strong programming language available for developing fewer complex programs and applications.
1.1 Installation of SciPy
To install SciPy in your system, you can use Python package manager pip. Before proceeding, make sure that you have Python already installed in your system. Here’s the step to install Python in your system.
Step 1: Firstly, Open terminal and Command Prompt in your system.
Step 2: Run the installation Command to install SciPy in your system.
pip install scipy
Note: Pip will be download and install SciPy along with dependencies. This process will may take some time depends on internet connection.
Step 4: To verify installation, you need to import SciPy in a Python script or interactive shell.
import scipy
print(scipy.__version__)
Now the installation of SciPy is successfully completed.
2. How does Data Analysis work with SciPy?
2.1 Data Preparation
- Import the necessary libraries: import numpy as np and import scipy as sp.
- Load or generate your dataset using NumPy or pandas.
2.2 Exploratory Data Analysis (EDA)
- Exploratory Data Analysis use descriptive statistics from SciPy’s stats module to gain insights into the dataset.
- Calculate measures such as mean, median, standard deviation, skewness, kurtosis, etc.
Python
from scipy import stats
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
mean_val = np.mean(data)
std_dev = np.std(data)
t_stat, p_value = stats.ttest_1samp(data, popmean=5)
print("t_stat:", t_stat)
print("p_value:", p_value)
Output:
t_stat: 0.0
p_value: 1.0
2.3 Statistical Hypothesis Testing
Use SciPy’s stats module for various hypothesis tests such as t-tests, chi-square tests, ANOVA, etc.
t_stat, p_value = stats.ttest_ind(group1, group2)
2.4 Regression Analysis
Utilize the linregress function for linear regression analysis.
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
2.5 Signal and Image Processing
- Use the scipy.signal module for signal processing operations.
- Explore scipy.ndimage for image processing.
from scipy import signal, ndimage
result = signal.convolve2d(image, kernel, mode='same', boundary='wrap')
2.6 Optimization
Employ the optimization functions in SciPy to find optimal parameter values.
from scipy.optimize import minimize
def objective_function(x):
return x[0]**2 + x[1]**2
result = minimize(objective_function, [0, 0])
3. Import SciPy
Once SciPy is installed , you need to import the SciPy module(s)
Python
import numpy as np
A = [[1, 2,],
[5, 6,]]
3.1 Linear Algebra
Determinant of a Matrix
Python
from scipy import linalg
A = [[1, 2,],
[5, 6,]]
linalg.det(A)
Output:
-4.0
4. Compute pivoted LU decomposition of a Matrix
LU decomposition is a method that reduce matrix into constituent parts that helps in easier calculation of complex matrix operations. The decomposition methods are also called matrix factorization methods, are base of linear algebra in computers, even for basic operations such as solving systems of linear equations, calculating the inverse and calculating the determinant of a matrix. The decomposition is: A = P L U where P is a permutation matrix, L lower triangular with unit diagonal elements and U upper triangular.
Python
P, L, U = linalg.lu(A)
print(P)
print(L)
print(U)
print(np.dot(L, U))
Output:
[[0. 1. 0.]
[0. 0. 1.]
[1. 0. 0.]]
[[1. 0. 0. ]
[0.14285714 1. 0. ]
[0.57142857 0.5 1. ]]
[[7. 8. 8. ]
[0. 0.85714286 1.85714286]
[0. 0. 0.5 ]]
[[0.14285714 1. 0. ]
[0.57142857 0.5 1. ]
[1. 0. 0. ]]
5. Eigen values and Eigen vectors of this matrix
Lets find the eigen values and eigen vectors of this matrix,
Python
eigen_values, eigen_vectors = linalg.eig(A)
print(eigen_values)
print(eigen_vectors)
Output:
array([ 15.55528261+0.j, -1.41940876+0.j, -0.13587385+0.j])
array([[-0.24043423, -0.67468642, 0.51853459],
[-0.54694322, -0.23391616, -0.78895962],
[-0.80190056, 0.70005819, 0.32964312]])
Solving systems of linear equations can also be done
Python
v = np.array([[2, 4],[3,8]])
print(v)
s = linalg.solve(A,v)
print(s)
Output:
[[2 4]
[3 8]]
[[-1.5 -2. ]
[ 1.75 3. ]]
6. Sparse Linear Algebra
SciPy has some routines for computing with sparse and potentially very large matrices. The necessary tools are in the submodule scipy.sparse.
Let’s look on how to construct a large sparse matrix
Python
from scipy import sparse
A = sparse.lil_matrix((1000, 1000))
print(A)
A[0, :100] = np.random.rand(100)
A[1, 100:200] = A[0, :100]
A.setdiag(np.random.rand(1000))
print(A)
Output:
(0, 0) 0.7948113035416484
(0, 1) 0.22210781047073025
(0, 2) 0.1198653673336828
(0, 3) 0.33761517140362796
(0, 4) 0.9429097039125192
(0, 5) 0.32320293202075523
(0, 6) 0.5187906217433661
(0, 7) 0.7030189588951778
(0, 8) 0.363629602379294
(0, 9) 0.9717820827209607
(0, 10) 0.9624472949421112
(0, 11) 0.25178229582536416
(0, 12) 0.49724850589238545
(0, 13) 0.30087830981676966
(0, 14) 0.2848404943774676
(0, 15) 0.036886947354532795
6.1 Linear Algebra for Sparse Matrices
Python
from scipy.sparse import linalg
A.tocsr()
A = A.tocsr()
b = np.random.rand(1000)
ans = linalg.spsolve(A, b)
print(ans)
Output:
[-4.67207136e+01 -3.69332972e+02 3.69393775e-01 6.32141409e-02
3.33772205e+00 5.10104872e-01 3.07850190e+00 1.94608719e+01
1.49997674e+00 1.04751174e+00 9.23616608e-01 8.14103772e-01
8.42662424e-01 2.28221903e+00 4.92361307e+01 6.74574814e-01
3.06515031e-01 3.36481843e-02 9.55613073e-01 7.22911464e-01
2.70518013e+00 1.25039001e+00 1.37825326e-01 3.95005049e-01
4.04480605e+00 7.72817743e-01 2.14200400e-01 7.06283767e-01
1.12635170e-01 5.98880840e+00 4.37382510e-01 8.05571435e-01
..............................................................................................................................]
Python
from scipy import integrate
f = lambda y, x: x*y**2
i = integrate.dblquad(f, 0, 2, lambda x: 0, lambda x: 1)
print(i)
Output:
(0.6666666666666667, 7.401486830834377e-15)
There is a lot more that SciPy is capable of, such as Fourier Transforms, Bessel Functions, etc.
To learn more, click here to find the SciPy tutorial.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice