Modul Praktikum SciPy

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

SciPy (Science Python)

What is SciPy?
SciPy is a scientific computation library that uses NumPy underneath.
SciPy stands for Scientific Python.
It provides more utility functions for optimization, stats and signal processing.
Like NumPy, SciPy is open source so we can use it freely.
SciPy was created by NumPy's creator Travis Olliphant.

Why Use SciPy?


If SciPy uses NumPy underneath, why can we not just use NumPy?
SciPy has optimized and added functions that are frequently used in NumPy and Data Science.

Which Language is SciPy Written in?


SciPy is predominantly written in Python, but a few segments are written in C.

Where is the SciPy Codebase?


The source code for SciPy is located at this github repository https://github.com/scipy/scipy
github: enables many people to work on the same codebase.

Import SciPy
Once SciPy is installed, import the SciPy module(s) you want to use in your applications by adding the from
scipy import module statement:

Now we have imported the constants module from SciPy, and the application is ready to use it:
Example
How many cubic meters are in one liter:

constants: SciPy offers a set of mathematical constants, one of them is liter which returns 1 liter as cubic
meters.
You will learn more about constants in the next chapter.

Checking SciPy Version


The version string is stored under the __version__ attribute.

Note: two underscore characters are used in __version__.

Constants in SciPy
As SciPy is more focused on scientific implementations, it provides many built-in scientific constants.
These constants can be helpful when you are working with Data Science.
PI is an example of a scientific constant.
Example
Print the constant value of PI:

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 1


Constant Units
A list of all units under the constants module can be seen using the dir() function.
Example
List all constants:

Unit Categories
The units are placed under these categories:
• Metric
• Binary
• Mass
• Angle
• Time
• Length
• Pressure
• Volume
• Speed
• Temperature
• Energy
• Power
• Force

Metric (SI) Prefixes:


Return the specified unit in meter (e.g. centi returns 0.01)
Example

Binary Prefixes:
Return the specified unit in bytes (e.g. kibi returns 1024)

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 2


Example

Mass:
Return the specified unit in kg (e.g. gram returns 0.001)
Example

Angle:
Return the specified unit in radians (e.g. degree returns 0.017453292519943295)
Example

Time:
Return the specified unit in seconds (e.g. hour returns 3600.0)
Example

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 3


Length:
Return the specified unit in meters (e.g. nautical_mile returns 1852.0)
Example

Pressure:
Return the specified unit in pascals (e.g. psi returns 6894.757293168361)
Example

Area:
Return the specified unit in square meters(e.g. hectare returns 10000.0)
Example

Volume:
Return the specified unit in cubic meters (e.g. liter returns 0.001)

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 4


Example

Speed:
Return the specified unit in meters per second (e.g. speed_of_sound returns 340.5)
Example

Temperature:
Return the specified unit in Kelvin (e.g. zero_Celsius returns 273.15)
Example

Energy:
Return the specified unit in joules (e.g. calorie returns 4.184)
Example

Power:
Return the specified unit in watts (e.g. horsepower returns 745.6998715822701)
Example

Force:
Return the specified unit in newton (e.g. kilogram_force returns 9.80665)

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 5


Example

Optimizers in SciPy
Optimizers are a set of procedures defined in SciPy that either find the minimum value of a function, or the root
of an equation.

Optimizing Functions
Essentially, all of the algorithms in Machine Learning are nothing more than a complex equation that needs to
be minimized with the help of given data.

Roots of an Equation
NumPy is capable of finding roots for polynomials and linear equations, but it can not find roots for non linear
equations, like this one:
x + cos(x)
For that you can use SciPy's optimze.root function.
This function takes two required arguments:
fun - a function representing an equation.
x0 - an initial guess for the root.
The function returns an object with information regarding the solution.
The actual solution is given under attribute x of the returned object:
Example
Find root of the equation x + cos(x):

Note: The returned object has much more information about the solution.
Example
Print all information about the solution (not just x which is the root)
print(myroot)

Minimizing a Function
A function, in this context, represents a curve, curves have high points and low points.
High points are called maxima.
Low points are called minima.
The highest point in the whole curve is called global maxima, whereas the rest of them are called local maxima.
The lowest point in whole curve is called global minima, whereas the rest of them are called local minima.

Finding Minima
We can use scipy.optimize.minimize() function to minimize the function.
The minimize() function takes the following arguments:

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 6


fun - a function representing an equation.
x0 - an initial guess for the root.
method - name of the method to use. Legal values:
'CG'
'BFGS'
'Newton-CG'
'L-BFGS-B'
'TNC'
'COBYLA'
'SLSQP'
callback - function called after each iteration of optimization.
options - a dictionary defining extra params:
{
"disp": boolean - print detailed description
"gtol": number - the tolerance of the error
}
Example
Minimize the function x^2 + x + 2 with BFGS:

What is Sparse Data


Sparse data is data that has mostly unused elements (elements that don't carry any information ).
It can be an array like this one:
[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]
Sparse Data: is a data set where most of the item values are zero.
Dense Array: is the opposite of a sparse array: most of the values are not zero.
In scientific computing, when we are dealing with partial derivatives in linear algebra we will come across
sparse data.

How to Work With Sparse Data


SciPy has a module, scipy.sparse that provides functions to deal with sparse data.
There are primarily two types of sparse matrices that we use:
CSC - Compressed Sparse Column. For efficient arithmetic, fast column slicing.
CSR - Compressed Sparse Row. For fast row slicing, faster matrix vector products
We will use the CSR matrix in this tutorial.

CSR Matrix
We can create CSR matrix by passing an arrray into function scipy.sparse.csr_matrix().
Example
Create a CSR matrix from an array:

From the result we can see that there are 3 items with value.
The 1. item is in row 0 position 5 and has the value 1.

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 7


The 2. item is in row 0 position 6 and has the value 1.
The 3. item is in row 0 position 8 and has the value 2.

Sparse Matrix Methods


Viewing stored data (not the zero items) with the data property:
Example

Counting nonzeros with the count_nonzero() method.

Eliminating duplicate entries with the sum_duplicates() method.

Converting from csr to csc with the tocsc() method:

Note: Apart from the mentioned sparse specific operations, sparse matrices support all of the operations that
normal matrices support e.g. reshaping, summing, arithemetic, broadcasting etc.

Working with Graphs


Graphs are an essential data structure.
SciPy provides us with the module scipy.sparse.csgraph for working with such data structures.

Adjacency Matrix
Adjacency matrix is a nxn matrix where n is the number of elements in a graph.
And the values represents the connection between the elements.
Example:

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 8


For a graph like this, with elements A, B and C, the connections are:
A & B are connected with weight 1.
A & C are connected with weight 2.
C & B is not connected.
The Adjency Matrix would look like this:
A B C
A:[0 1 2]
B:[1 0 0]
C:[2 0 0]

Below follows some of the most used methods for working with adjacency matrices.

Connected Components
Find all of the connected components with the connected_components() method.
Example

Dijkstra
Use the dijkstra method to find the shortest path in a graph from one element to another.
It takes following arguments:
1. return_predecessors: boolean (True to return whole path of traversal otherwise
False).
2. indices: index of the element to return all paths from that element only.
3. limit: max weight of path.

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 9


Example
Find the shortest path from element 1 to 2:

Floyd Warshall
Use the floyd_warshall() method to find shortest path between all pairs of elements.
Example
Find the shortest path between all pairs of elements:

Bellman Ford
The bellman_ford() method can also find the shortest path between all pairs of elements,
but this method can handle negative weights as well.
Example
Find shortest path from element 1 to 2 with given graph with a negative weight:

Depth First Order


The depth_first_order() method returns a depth first traversal from a node.

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 10


This function takes following arguments:
1. the graph.
2. the starting element to traverse graph from.
Example
Traverse the graph depth first for given adjacency matrix:

Breadth First Order


The breadth_first_order() method returns a breadth first traversal from a node.
This function takes following arguments:
1. the graph.
2. the starting element to traverse graph from.
Example
Traverse the graph breadth first for given adjacency matrix:

Working with Spatial Data


Spatial data refers to data that is represented in a geometric space.
E.g. points on a coordinate system.
We deal with spatial data problems on many tasks.
E.g. finding if a point is inside a boundary or not.
SciPy provides us with the module scipy.spatial, which has functions for working with
spatial data.

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 11


Triangulation
A Triangulation of a polygon is to divide the polygon into multiple triangles with which we
can compute an area of the polygon.
A Triangulation with points means creating surface composed triangles in which all of the
given points are on at least one vertex of any triangle in the surface.
One method to generate these triangulations through points is the Delaunay() Triangulation.
Example
Create a triangulation from following points:

Convex Hull
A convex hull is the smallest polygon that covers all of the given points.
Use the ConvexHull() method to create a Convex Hull.
Example
Create a convex hull for following points:

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 12


KDTrees
KDTrees are a datastructure optimized for nearest neighbor queries.
E.g. in a set of points using KDTrees we can efficiently ask which points are nearest to a
certain given point.
The KDTree() method returns a KDTree object.
The query() method returns the distance to the nearest neighbor and the location of the
neighbors.
Example
Find the nearest neighbor to point (1,1):

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 13


Distance Matrix
There are many Distance Metrics used to find various types of distances between two points
in data science, Euclidean distsance, cosine distsance etc.
The distance between two vectors may not only be the length of straight line between them, it
can also be the angle between them from origin, or number of unit steps required etc.
Many of the Machine Learning algorithm's performance depends greatly on distance
metrices. E.g. "K Nearest Neighbors", or "K Means" etc.
Let us look at some of the Distance Metrices:

Euclidean Distance
Find the euclidean distance between given points.
Example

Cityblock Distance (Manhattan Distance)


Is the distance computed using 4 degrees of movement.
E.g. we can only move: up, down, right, or left, not diagonally.

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 14


Example
Find the cityblock distance between given points:

Cosine Distance
Is the value of cosine angle between the two points A and B.
Example
Find the cosine distsance between given points:

Hamming Distance
Is the proportion of bits where two bits are difference.
It's a way to measure distance for binary sequences.
Example
Find the hamming distance between given points:

Maman Somantri / Modul Praktikum Algoritma dan Pemrograman 15

You might also like