Pandas
Pandas
Function
Module
Package
Library
SciPy : This useful library includes modules for linear algebra, integration, optimization, and statistics.
Pandas: Pandas is a library created to help developers work with "labeled" and "relational" data
intuitively.
Matplotlib: Matplotlib helps with data analyzing, and is a numerical plotting library.
Pillow: Pillow is a friendly fork of PIL (Python Imaging Library), but is more user-friendly.
Keras: Keras is a great library for building neural networks and modeling.
PyTorch: PyTorch is a framework that is perfect for data scientists who want to perform deep learning
tasks easily.
TensorFlow : TensorFlow is a popular Python framework for machine learning and deep learning,
which was developed at Google Brain
Department of Computer Engineering and Information Technology
College of Engineering Pune (COEP)
Forerunners in Technical Education
Why Use Pandas?
❖ Create Data - We begin by creating our own data set for analysis.
❖ Get Data - We will learn how to read in the text file.
❖ Prepare Data - Here we will simply take a look at the data and make
sure it is clean.
❖ Analyze Data - We will simply find the most popular name in a specific
year.
❖ Present Data - Through tabular data and a graph, clearly show the end
user what is the most popular name in a specific year.
Installation of pandas
If you have Python and PIP already installed on a system, then installation of
pandas is very easy.
Import pandas
❖ Pandas is the most popular python library that is used for data
analysis.
Series:
● Output:
● Example: import pandas as pd
0 2
a = pd.Series([2,7,4,1])
1 7
2 4
● Example: import pandas as pd 3 1
a = pd.Series(data=[2,7,4,1]) dtype: int64
● Output:
● Example: import pandas as pd
A 2
a=
pd.Series([2,7,4,1],[“A”,”B”,”C”,”D”]) B 7
C 4
● Example: import pandas as pd D 1
a = pd.Series(data=[2,7,4,1], dtype: int64
index=[“A”,”B”,”C”,”D”])
● Output:
● Example: import pandas as pd day1 420
● calories = {"day1": 420, "day2": 380, "day3": 390} day2 380
day3 390
● myvar = pd.Series(calories) dtype: int64
● print(myvar)
Features of DataFrame
● Potentially columns are of different types
● Size – Mutable
● Labeled axes (rows and columns)
● Can Perform Arithmetic operations on rows and columns
We can think of it
as an SQL table or
a spreadsheet data
representation.
index: For the row labels, the Index to be used for the resulting frame is
Optional Default np.arange(n) if no index is passed.
import pandas as pd
data = [1,2,3,4,5] Output:
df = pd.DataFrame(data)
0
0 1
print(df)
1 2
2 3
3 4
4 5
import pandas as pd
data = [1,2,3,4] Output:
index=[“A”,”B”,”C”,”D”] 0
df = pd.DataFrame(data,index) A 1
print(df) B 2
C 3
D 4
import pandas as pd
data = [1,2,3,4] Output:
index=[“A”,”B”,”C”,”D”] Roll_no
A 1
df = pd.DataFrame(data,index, B 2
columns=[“Roll_no”]) C 3
print(df) D 4
0 Abc 10
1 pqr 12
2 xyz 13
import pandas as pd
data=[ ["Abc", 10 ],[ "pqr",12],[ "xyz ",14 ] ]
df=pd.DataFrame(data,columns=["Name","Age"])
print(df)
Name Age
import pandas as pd 0 Tom 28
1 Jack 34
data = { 'Name': ['Tom', 'Jack', 'Steve',
2 Steve 29
'Ricky'], 'Age':[28,34,29,42] }
3 Ricky 42
df = pd.DataFrame(data)
print(df)
import pandas as pd
Output:
data = { 'Name':['Tom', 'Jack', 'Steve','asda'],
'Age':[2,34,29,42], Name Age Roll_no
0 Tom 2 111
'Roll_no':[111,222,333,444] } 1 Jack 34 222
2 Steve 29 333
3 asda 42 444
df = pd.DataFrame(data)
print(df) 0 111
1 222
2 333
print(df['Roll_no']) 3 444
Name: Roll_no, dtype: int64
0 112
print(df) 1 224
2 336
3 448
print(df["Roll_no"]+df["R1"]) dtype: int64
one 1
print(df.loc[0]) two 6
Name: 0, dtype: int64
print(df.loc[0].values) [1 6]
one 1
print(df.loc[“A1”]) two 6
Name: A1, dtype: int64
Rows can be selected by passing row label to a iloc function.
print(df.iloc[0]) one 1
two 6
Name: A1, dtype: int64
one two
print(df.iloc[0:2]) A1 1 6
B1 2 7
Output:
print(g.sort_index()) Roll
A 2
L 7
M 5
Z 4
Output:
R1 R2 R3 R4
Z 12 45 56 78
M 14 21 33 53 Output:
L 1 2 3 4
A 9 7 5 2
R1 R2 R3 R4
L 1 2 3 4
print(g.sort_values(by=["R1"])) A 9 7 5 2
Z 12 45 56
78
M 14 21 33 53