Class 6 Pandas
Class 6 Pandas
Python Pandas
What is Pandas ?
Pandas is a powerful data manipulation and analysis library for Python.
It provides data structures like Series and DataFrame for handling and analyzing data.
Why Pandas?
The beauty of Pandas is that it simplifies the task related to data frames and makes it
simple to do many of the time-consuming, repetitive tasks involved in working with data
frames, such as:
Applications of Pandas
Data Cleaning: Pandas provides functionalities to clean messy data, deal with incomplete
or inconsistent data, handle missing values, remove duplicates, and standardize formats to
do effective data analysis.
Data Exploration: Pandas easily summarizes statistics, finds trends, and visualizes data
using built-in plotting functions, Matplotlib, or Seaborn integration.
Data Preparation: Pandas may pivot, melt, convert variables, and merge datasets based
on common columns to prepare data for analysis.
Data Analysis: Pandas supports descriptive statistics, time series analysis, group-by
operations, and custom functions.
Data Visualization: Pandas itself has basic plotting capabilities; it integrates and supports
data visualization libraries like Matplotlib, Seaborn, and Plotly to create innovative
visualizations.
Time Series Analysis: Pandas supports date/time indexing, resampling, frequency
localhost:8889/notebooks/Python/Python/Class 6 Pandas.ipynb 1/13
9/28/24, 3:08 PM Class 6 Pandas - Jupyter Notebook
1. Installation
To install pandas, use:
2. Importing pandas
Series
0 1
1 3
2 5
3 6
4 8
dtype: int64
DataFrame
Name Age
0 John 28
1 Anna 24
2 Peter 35
3 Linda 32
4. Reading Data
Pandas can read data from various file formats, including CSV, Excel, SQL databases, and
more.
In [4]: df = pd.read_csv('train.csv')
In [5]: df
Out[5]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fa
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.250
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.283
(Florence
Briggs
Th...
Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.925
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.100
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.050
Henry
... ... ... ... ... ... ... ... ... ...
Montvila,
886 887 0 2 Rev. male 27.0 0 0 211536 13.000
Juozas
Graham,
Miss.
887 888 1 1 female 19.0 0 0 112053 30.000
Margaret
Edith
Johnston,
Miss.
W./C.
888 889 0 3 Catherine female NaN 1 2 23.450
6607
Helen
"Carrie"
Behr, Mr.
889 890 1 1 Karl male 26.0 0 0 111369 30.000
Howell
Dooley,
890 891 0 3 Mr. male 32.0 0 0 370376 7.750
Patrick
In [7]: df1
5. Data Inspection
You can inspect your data using various methods:
Out[8]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 female 38.0 1 0 PC 17599 71.2833
(Florence
Briggs
Th...
Heikkinen,
STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry
Out[9]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare C
Montvila,
886 887 0 2 Rev. male 27.0 0 0 211536 13.00
Juozas
Graham,
Miss.
887 888 1 1 female 19.0 0 0 112053 30.00
Margaret
Edith
Johnston,
Miss.
W./C.
888 889 0 3 Catherine female NaN 1 2 23.45
6607
Helen
"Carrie"
Behr, Mr.
889 890 1 1 Karl male 26.0 0 0 111369 30.00 C
Howell
Dooley,
890 891 0 3 Mr. male 32.0 0 0 370376 7.75
Patrick
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
Selecting columns
Out[13]: 0 Food
1 Food
2 Food
3 Utilities
4 Utilities
5 Rent
6 Utilities
7 Food
8 Food
Name: Category, dtype: object
0 Food sandwich
1 Food samosa
2 Food Groceries
7 Food sandwich
8 Food samosa
Selecting rows
Filtering data
7. Data Manipulation
Dropping columns
Renaming columns
In [23]: print(df.isnull().sum())
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
In [24]: print(df.notnull().sum())
PassengerId 891
Survived 891
Pclass 891
Name 891
Sex 891
Age 714
SibSp 891
Parch 891
Ticket 891
Fare 891
Cabin 204
Embarked 889
dtype: int64
In [26]: print(df2.isnull().sum())
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 0
Embarked 0
dtype: int64
In [27]: print(df2.notnull().sum())
PassengerId 183
Survived 183
Pclass 183
Name 183
Sex 183
Age 183
SibSp 183
Parch 183
Ticket 183
Fare 183
Cabin 183
Embarked 183
dtype: int64
In [29]: print(df3.isnull().sum())
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 0
Embarked 0
dtype: int64
In [30]: print(df3.notnull().sum())
PassengerId 891
Survived 891
Pclass 891
Name 891
Sex 891
Age 891
SibSp 891
Parch 891
Ticket 891
Fare 891
Cabin 891
Embarked 891
dtype: int64
Out[31]: Name
Age
24 Anna
28 John
32 Linda
35 Peter
Out[32]:
Age Name
24 Anna
28 John
32 Linda
35 Peter
Merging DataFrames
0 B 2 5
1 D 4 6
Joining DataFrames
0 A 1 NaN
1 B 2 5.0
2 C 3 NaN
3 D 4 6.0