Python Data Science with Jupyter Notebook

Python has several scientific computing libraries that are useful for data science tasks. NumPy provides multi-dimensional arrays and mathematical functions. Pandas allows for data analysis and manipulation by organizing data into tabular DataFrame structures. Matplotlib enables data visualization through plotting capabilities. Pandas builds on NumPy and is often used with SciPy, Matplotlib, and scikit-learn. Common tasks involve creating arrays and DataFrames, reading and writing data files, handling missing values, and summarizing datasets.

Uploaded by

anis hannani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views24 pages

Python Data Science with Jupyter Notebook

Uploaded by

anis hannani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PYTHON PROGRAMMING

(PYTHON 3.X using Jupyter Notebook)

Scientific Functions

DSC551 – PROGRAMMING FOR DATA SCIENCE

Pn Marhainis Jamaludin
Faculty of Computer and Mathematical Sciences
Introduction
• Python is widely used in scientific and numeric
computating. Some of the common functions are:
• Numpy = It’s a multi-dimensional array-oriented computing
functionalities designed for high-level mathematical functions
and scientific computation.
• Scipy = high-level scientific computing
• Pandas = data analysis and manipulation – to organize data
and manipulate the data by putting it in a tabular form.
• Matplotlib = data visualization – plotting
• Pandas is built on top of the NumPy package -lots of
structure of NumPy is used or replicated in Pandas.
Data in pandas is often used to feed statistical analysis
in SciPy, plotting functions from Matplotlib, and
machine learning algorithms in Scikit-learn.
NUMPY
What is Numpy?
• Extension package to python for multi-dimensional
array
• Is also known as array-oriented computing
• Need to import numpy package into python
Creating arrays
• 1-Dimensional array

• 2-Dimensional array and above

Functions to create array
Basic data types
Example:
Indexing and Slicing
PANDAS
What is Pandas?
• is a software library written for the Python programming
language for data manipulation and analysis.
• Main components of pandas
1. Series = column
2. DataFrame = multi-dimensional table made-up of collection of
Series

• Need to import pandas library package into python:

Creating DataFrame
• Creating DataFrame in python is by using dict

• Let's say we have a fruit stand that sells apples and oranges. We want to have a column for
each fruit and a row for each customer purchase. To organize this as a dictionary for
pandas we could do something like:

• And then pass it to the pandas DataFrame constructor:

• Each (key, value) item in data corresponds to a column in the resulting DataFrame.
• The Index of this DataFrame was given to us on creation as the numbers 0-3, but
we could also create our own when we initialize the DataFrame.

Let's have customer names as our index:

• So now we could locate a customer's order by using their name:

How to read data?
• You can load data from various file formats into
DataFrame in python.
• Common file formats : csv, json or sql files
• Read data from csv file:
Convert back to file format
• Once you have completed with DataFrame, and to
save the into the file format such as csv, json or sql
Some Common functions
• head() - by default will output the first five rows from your DataFrame

Will output the first 10 rows from your DataFrame

• tail() – by default will output the last five rows from your DataFrame
Will output the last 2 rows from your DataFrame

• info() – provides the important details about your dataset loaded into
DataFrame,number of null values, data types for each column and how
many memory used

• shape - a simple tuple format (rows, columns) – how many rows and
columns the dataset loaded
Missing Data
• Missing data in Pandas is represented by :
• None
• NaN
• Is an acronym for Not a Number
• It is a special floating-point value recognized by all systems that use the standard IEEE floating-
point representation.
• These functions to detect missing data
• isnull()
• notnull()
• Calculation with missing values:
• Summation – NaN will be treated as 0
• If all data NaN, then the result will be NaN
• Cleaning/Filling missing values:
• Replace NaN with scalar values – for example replace with 0
• Fill NA with backward (backfill) or forward (pad)
• Drop the missing values:
• Use dropna() function to exclude the missing values
• Replace missing values with generic values:
• Use fillna() function to replace the missing values
Example:
Example:
Calculation of missing values:

Replace missing values with scalar values, this example is to replace with value’0’, it can
be replaced with any other values:
Example:
Filling NA with Backward or Forward:

Drop/exclude the missing values:

Example:
Replace missing values with generic values:
References
• https://www.tutorialspoint.com/python_pandas/p
ython_pandas_missing_data.htm

Data Analytics Preparation & Visualization
No ratings yet
Data Analytics Preparation & Visualization
54 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Unit V Pandas AIML A B Lastupdated 18-06-2024
No ratings yet
Unit V Pandas AIML A B Lastupdated 18-06-2024
33 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
22 pages
Pandas Learndatasci
No ratings yet
Pandas Learndatasci
86 pages
Pandas
No ratings yet
Pandas
41 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
31 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
12 pages
Week 3 Python
No ratings yet
Week 3 Python
152 pages
Pandas
No ratings yet
Pandas
13 pages
Unit 4
No ratings yet
Unit 4
36 pages
Module 6
No ratings yet
Module 6
48 pages
Data Handling with Pandas in Python
No ratings yet
Data Handling with Pandas in Python
14 pages
Getting Started with Pandas DataFrames
No ratings yet
Getting Started with Pandas DataFrames
38 pages
4 Data Visualization
No ratings yet
4 Data Visualization
76 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
02 Python Basics
No ratings yet
02 Python Basics
52 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Data Frame
No ratings yet
Data Frame
95 pages
Data Analysis with Pandas Overview
No ratings yet
Data Analysis with Pandas Overview
49 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
41 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Grade-XII-IP - Ch-1 - Series Notes
No ratings yet
Grade-XII-IP - Ch-1 - Series Notes
28 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
Practical 7
No ratings yet
Practical 7
8 pages
Python Pandas Beginner's Guide
No ratings yet
Python Pandas Beginner's Guide
45 pages
06 MGMT 590 Fall 2019 Data Handling With Pandas
No ratings yet
06 MGMT 590 Fall 2019 Data Handling With Pandas
14 pages
Module 4
No ratings yet
Module 4
57 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
4 Pandas
No ratings yet
4 Pandas
35 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Lec2 PandasDataframes 1
No ratings yet
Lec2 PandasDataframes 1
17 pages
Python Pandas
100% (1)
Python Pandas
96 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
122 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Python Foundation For Data Science
No ratings yet
Python Foundation For Data Science
9 pages
Pandas Series - Notes For PA3
No ratings yet
Pandas Series - Notes For PA3
9 pages
Pandas
No ratings yet
Pandas
29 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Panduan Pandas
No ratings yet
Panduan Pandas
33 pages
Pandas Definitions Summary
No ratings yet
Pandas Definitions Summary
2 pages
Pandas
100% (1)
Pandas
163 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Pandas
No ratings yet
Pandas
7 pages
Pandas
No ratings yet
Pandas
29 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Ngbolua 212018 AJGR45800
No ratings yet
Ngbolua 212018 AJGR45800
15 pages
Bipolaris Sorokiniana A POTENTIAL INDIGENOUS PLANT PATHOGEN TO CONTROL GOOSEGRASS Eleusine Indica IN OIL PALM PLANTATIONS
No ratings yet
Bipolaris Sorokiniana A POTENTIAL INDIGENOUS PLANT PATHOGEN TO CONTROL GOOSEGRASS Eleusine Indica IN OIL PALM PLANTATIONS
9 pages
Data Representation & Networks
No ratings yet
Data Representation & Networks
26 pages
(8a) Python - Advanced Function
No ratings yet
(8a) Python - Advanced Function
10 pages
Major Pest of Paddy - Chapter 5 - Agr464
No ratings yet
Major Pest of Paddy - Chapter 5 - Agr464
18 pages
CHRYSLER - Spare Catalogue
No ratings yet
CHRYSLER - Spare Catalogue
8 pages
Ramp Timer Pro Installation Guide
No ratings yet
Ramp Timer Pro Installation Guide
6 pages
Cyber Threats in Aviation Q3 2022 Report
No ratings yet
Cyber Threats in Aviation Q3 2022 Report
3 pages
MOS Transistor Cooling & Geometry
No ratings yet
MOS Transistor Cooling & Geometry
1 page
English Vocabulary for Tech Learners
No ratings yet
English Vocabulary for Tech Learners
10 pages
Thesis Latex or Word
100% (3)
Thesis Latex or Word
7 pages
Orchard School Bristol Homework
100% (1)
Orchard School Bristol Homework
6 pages
Lecture ADC
No ratings yet
Lecture ADC
19 pages
CCSP Exam Cram Domain 6 Handout
No ratings yet
CCSP Exam Cram Domain 6 Handout
142 pages
Multiple Choice Questions Numerical Methods
76% (34)
Multiple Choice Questions Numerical Methods
3 pages
SA-Booklet Teledyne FDIMU PN 2234320-01-01-SA31UG1302149
No ratings yet
SA-Booklet Teledyne FDIMU PN 2234320-01-01-SA31UG1302149
80 pages
Assignment Guide SAT 20
No ratings yet
Assignment Guide SAT 20
19 pages
Pros and Cons of Using The Internet As A Student: A Qualitative Research
No ratings yet
Pros and Cons of Using The Internet As A Student: A Qualitative Research
3 pages
BJ Coiled Tubing Equipment Manual Version 1
95% (40)
BJ Coiled Tubing Equipment Manual Version 1
90 pages
Specification FOR Approval: Title
No ratings yet
Specification FOR Approval: Title
30 pages
Statathon
No ratings yet
Statathon
1 page
DBMS Journal Guidelines
No ratings yet
DBMS Journal Guidelines
7 pages
AFM 244: Data Analytics Overview
No ratings yet
AFM 244: Data Analytics Overview
3 pages
Printer-Friendly Grimdark Millennium - 40K Edition (Beta 18012025)
No ratings yet
Printer-Friendly Grimdark Millennium - 40K Edition (Beta 18012025)
35 pages
Work 2 - Final Boiler Simulator
No ratings yet
Work 2 - Final Boiler Simulator
13 pages
Just How Good Can China Get at Generative AI
No ratings yet
Just How Good Can China Get at Generative AI
13 pages
Digital Thesis Universitas Kristen Petra
No ratings yet
Digital Thesis Universitas Kristen Petra
5 pages
Lecture 2: Malware & Social Engineering Attack
No ratings yet
Lecture 2: Malware & Social Engineering Attack
50 pages
CSfC Data-at-Rest Capability V5.0
No ratings yet
CSfC Data-at-Rest Capability V5.0
84 pages
SABRE FLITE - Certiifcate
No ratings yet
SABRE FLITE - Certiifcate
5 pages
Management of Contractors Procedure
No ratings yet
Management of Contractors Procedure
2 pages
Stream
No ratings yet
Stream
3 pages
CB6 6th Edition Barry J Babin Eric Harris
No ratings yet
CB6 6th Edition Barry J Babin Eric Harris
306 pages
Svetlana EL509/6KG6 Tetrode Specs
No ratings yet
Svetlana EL509/6KG6 Tetrode Specs
2 pages
Stair Check
No ratings yet
Stair Check
1 page

Python Data Science with Jupyter Notebook

Uploaded by

Python Data Science with Jupyter Notebook

Uploaded by

PYTHON PROGRAMMING

(PYTHON 3.X using Jupyter Notebook)

DSC551 – PROGRAMMING FOR DATA SCIENCE

• 2-Dimensional array and above

• Need to import pandas library package into python:

• And then pass it to the pandas DataFrame constructor:

Let's have customer names as our index:

• So now we could locate a customer's order by using their name:

Will output the first 10 rows from your DataFrame

Drop/exclude the missing values:

You might also like