Python - Scientific Functions
Python - Scientific Functions
Pn Marhainis Jamaludin
Faculty of Computer and Mathematical Sciences
Introduction
• Python is widely used in scientific and numeric
computating. Some of the common functions are:
• Numpy = It’s a multi-dimensional array-oriented computing
functionalities designed for high-level mathematical functions
and scientific computation.
• Scipy = high-level scientific computing
• Pandas = data analysis and manipulation – to organize data
and manipulate the data by putting it in a tabular form.
• Matplotlib = data visualization – plotting
• Pandas is built on top of the NumPy package -lots of
structure of NumPy is used or replicated in Pandas.
Data in pandas is often used to feed statistical analysis
in SciPy, plotting functions from Matplotlib, and
machine learning algorithms in Scikit-learn.
NUMPY
What is Numpy?
• Extension package to python for multi-dimensional
array
• Is also known as array-oriented computing
• Need to import numpy package into python
Creating arrays
• 1-Dimensional array
• Let's say we have a fruit stand that sells apples and oranges. We want to have a column for
each fruit and a row for each customer purchase. To organize this as a dictionary for
pandas we could do something like:
• Each (key, value) item in data corresponds to a column in the resulting DataFrame.
• The Index of this DataFrame was given to us on creation as the numbers 0-3, but
we could also create our own when we initialize the DataFrame.
• tail() – by default will output the last five rows from your DataFrame
Will output the last 2 rows from your DataFrame
• info() – provides the important details about your dataset loaded into
DataFrame,number of null values, data types for each column and how
many memory used
• shape - a simple tuple format (rows, columns) – how many rows and
columns the dataset loaded
Missing Data
• Missing data in Pandas is represented by :
• None
• NaN
• Is an acronym for Not a Number
• It is a special floating-point value recognized by all systems that use the standard IEEE floating-
point representation.
• These functions to detect missing data
• isnull()
• notnull()
• Calculation with missing values:
• Summation – NaN will be treated as 0
• If all data NaN, then the result will be NaN
• Cleaning/Filling missing values:
• Replace NaN with scalar values – for example replace with 0
• Fill NA with backward (backfill) or forward (pad)
• Drop the missing values:
• Use dropna() function to exclude the missing values
• Replace missing values with generic values:
• Use fillna() function to replace the missing values
Example:
Example:
Calculation of missing values:
Replace missing values with scalar values, this example is to replace with value’0’, it can
be replaced with any other values:
Example:
Filling NA with Backward or Forward: