0% found this document useful (0 votes)
10 views

Python - Scientific Functions

Python has several scientific computing libraries that are useful for data science tasks. NumPy provides multi-dimensional arrays and mathematical functions. Pandas allows for data analysis and manipulation by organizing data into tabular DataFrame structures. Matplotlib enables data visualization through plotting capabilities. Pandas builds on NumPy and is often used with SciPy, Matplotlib, and scikit-learn. Common tasks involve creating arrays and DataFrames, reading and writing data files, handling missing values, and summarizing datasets.

Uploaded by

anis hannani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Python - Scientific Functions

Python has several scientific computing libraries that are useful for data science tasks. NumPy provides multi-dimensional arrays and mathematical functions. Pandas allows for data analysis and manipulation by organizing data into tabular DataFrame structures. Matplotlib enables data visualization through plotting capabilities. Pandas builds on NumPy and is often used with SciPy, Matplotlib, and scikit-learn. Common tasks involve creating arrays and DataFrames, reading and writing data files, handling missing values, and summarizing datasets.

Uploaded by

anis hannani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

PYTHON PROGRAMMING

(PYTHON 3.X using Jupyter Notebook)


Scientific Functions

DSC551 – PROGRAMMING FOR DATA SCIENCE

Pn Marhainis Jamaludin
Faculty of Computer and Mathematical Sciences
Introduction
• Python is widely used in scientific and numeric
computating. Some of the common functions are:
• Numpy = It’s a multi-dimensional array-oriented computing
functionalities designed for high-level mathematical functions
and scientific computation.
• Scipy = high-level scientific computing
• Pandas = data analysis and manipulation – to organize data
and manipulate the data by putting it in a tabular form.
• Matplotlib = data visualization – plotting
• Pandas is built on top of the NumPy package -lots of
structure of NumPy is used or replicated in Pandas.
Data in pandas is often used to feed statistical analysis
in SciPy, plotting functions from Matplotlib, and
machine learning algorithms in Scikit-learn.
NUMPY
What is Numpy?
• Extension package to python for multi-dimensional
array
• Is also known as array-oriented computing
• Need to import numpy package into python
Creating arrays
• 1-Dimensional array

• 2-Dimensional array and above


Functions to create array
Basic data types
Example:
Indexing and Slicing
PANDAS
What is Pandas?
• is a software library written for the Python programming
language for data manipulation and analysis.
• Main components of pandas
1. Series = column
2. DataFrame = multi-dimensional table made-up of collection of
Series

• Need to import pandas library package into python:


Creating DataFrame
• Creating DataFrame in python is by using dict

• Let's say we have a fruit stand that sells apples and oranges. We want to have a column for
each fruit and a row for each customer purchase. To organize this as a dictionary for
pandas we could do something like:

• And then pass it to the pandas DataFrame constructor:

• Each (key, value) item in data corresponds to a column in the resulting DataFrame.
• The Index of this DataFrame was given to us on creation as the numbers 0-3, but
we could also create our own when we initialize the DataFrame.

Let's have customer names as our index:

• So now we could locate a customer's order by using their name:


How to read data?
• You can load data from various file formats into
DataFrame in python.
• Common file formats : csv, json or sql files
• Read data from csv file:
Convert back to file format
• Once you have completed with DataFrame, and to
save the into the file format such as csv, json or sql
Some Common functions
• head() - by default will output the first five rows from your DataFrame

Will output the first 10 rows from your DataFrame

• tail() – by default will output the last five rows from your DataFrame
Will output the last 2 rows from your DataFrame

• info() – provides the important details about your dataset loaded into
DataFrame,number of null values, data types for each column and how
many memory used

• shape - a simple tuple format (rows, columns) – how many rows and
columns the dataset loaded
Missing Data
• Missing data in Pandas is represented by :
• None
• NaN
• Is an acronym for Not a Number
• It is a special floating-point value recognized by all systems that use the standard IEEE floating-
point representation.
• These functions to detect missing data
• isnull()
• notnull()
• Calculation with missing values:
• Summation – NaN will be treated as 0
• If all data NaN, then the result will be NaN
• Cleaning/Filling missing values:
• Replace NaN with scalar values – for example replace with 0
• Fill NA with backward (backfill) or forward (pad)
• Drop the missing values:
• Use dropna() function to exclude the missing values
• Replace missing values with generic values:
• Use fillna() function to replace the missing values
Example:
Example:
Calculation of missing values:

Replace missing values with scalar values, this example is to replace with value’0’, it can
be replaced with any other values:
Example:
Filling NA with Backward or Forward:

Drop/exclude the missing values:


Example:
Replace missing values with generic values:
References
• https://www.tutorialspoint.com/python_pandas/p
ython_pandas_missing_data.htm

You might also like