Pandas Notes
Pandas Notes
May 8, 2023
1 What is pandas?
Pandas is a python library used for working with data sets It has fucntion for analyzing , cleaning
, exploration, and manipulation of data. Read and write data in differenet format like : xlxs, csv,
txt, JSON etc
Series = 1-d labeled array pd.series(data)
DataFrame = 2-d labeled array much like a table.: pd.dataframes(data)
Panel = A panel is a 3d container of data.
#Series and DataFrame is case sensitive do not use in small case.
3 Series
[30]: import pandas as pd
var = pd.Series([1,3,45,678,90])
print(var)
print(type(var))
1
print()
0 1
1 3
2 45
3 678
4 90
dtype: int64
<class 'pandas.core.series.Series'>
678
A NaN
B NaN
C NaN
D NaN
E NaN
dtype: float64
0 1.0
1 3.0
2 45.0
3 678.0
4 90.0
dtype: float64
0 1
1 3
2 45
3 678
2
4 90
dtype: int64
A [1, 2, 34, 5, 6]
B [2, 23, 34, 56, 4]
dtype: object
0 34
dtype: int64
1 12
2 12
3 12
4 12
5 12
dtype: int64
4 DataFrame
[33]: var = pd.DataFrame([[1,3,45,678,90,3],[2,3,45,67,8,3]])
print(var)
print(type(var))
0 1 2 3 4 5
0 1 3 45 678 90 3
1 2 3 45 67 8 3
<class 'pandas.core.frame.DataFrame'>
A B C
0 1 1 SD
1 23 2 DF
2 4 34 HK
3 5 5 KJ
If you ever pass data with dictionary in dataframe make sure both the dict data value should be
equal other wise it will give you a error
errro = All arrays must be of the same length.
[669]: var ={"A" :[1,23,4,5] ,"B":[1,2,34,5],"C":['SD','DF','HK','KJ']}
a = pd.DataFrame(var, columns=["A","B"]) #get A column or multiple columns
print(a)
3
#getting value using index in dataframe
# syntax: (var[column_name][index_number])
print(var["A"][3]) #ans =5
A B
0 1 1
1 23 2
2 4 34
3 5 5
Index(['A', 'B'], dtype='object')
5
#add
var["add"] = var["A"]+var["B"]
#sub
var["subtract"] = var["A"]-var["B"]
#multiply
var["multiply"] = var["A"]*var["B"]
#division
var["division"] = var["A"]/var["B"]
#modules
var["modulus"] = var["A"]%var["B"]
print(var)
print()
A B
0 1 12
1 2 3
2 34 456
3 56 7
4
4 7 23
print(var["A"]>4)
print()
print(var["B"]<12)
print()
print(var["A"]==7)
print()
print((var["A"] >7) & (var["B"] >9))
A B
0 1 12
1 2 3
2 34 456
3 56 7
4 7 23
0 False
1 False
2 True
3 True
4 True
Name: A, dtype: bool
0 False
1 True
2 False
3 True
5
4 False
Name: B, dtype: bool
0 False
1 False
2 False
3 False
4 True
Name: A, dtype: bool
0 False
1 False
2 True
3 False
4 False
dtype: bool
7 Insert in pandas
[95]: df =pd.DataFrame({'A':[1,2,34,56,7],'B':[12,3,456,7,23]})
print(df)
#creating a new column and get the limited data from the column A
A B
0 1 12
1 2 3
2 34 456
3 56 7
4 7 23
[95]: A C B NEW
0 1 1 12 1.0
1 2 2 3 2.0
2 34 34 456 34.0
3 56 56 7 56.0
6
4 7 7 23 NaN
8 Delete in pandas
#pop() - function is used to delele the data from pandas
[417]: df =pd.DataFrame({'A':[1,2,34,56,7],'B':[12,3,456,7,23]})
print(df)
print()
# syntax :dataframe.pop(label)
A B
0 1 12
1 2 3
2 34 456
3 56 7
4 7 23
[417]: B
0 12
1 3
2 456
3 7
4 23
# write csv
var =pd.DataFrame({"A":[1,4,74,23,63], "B":[34,987,23,23,52]})
print(var)
var.to_csv("test.csv",index=False, header=["name","id "])
print()
7
#header=["name","id "]
A B
0 1 34
1 4 987
2 74 23
3 23 23
4 63 52
print(df)
print()
print(df)
print()
print(df)
print()
name id
0 1 34
1 4 987
2 74 23
3 23 23
4 63 52
name id
0 1 34
1 4 987
name
8
0 1
1 4
2 74
3 23
4 63
name id
0 1 34
1 23 23
2 63 52
10 Read excel
[132]: df=pd.read_excel("C:\\Users\\sanram\\Videos\\Jupyter python practise\\book2.
↪xlsx")
df
9
328 Saravanan, Vennila Netherland … FRPR3ANA05PR 12
329 Tapessur, Ray Netherland … FRPR3ANA05PR 12
[5 rows x 23 columns]
11 Pandas Functions
[133]: df =pd.read_csv("C:\\Users\\sanram\\Videos\\Jupyter python practise\\pandas␣
↪case study\\blackfriday.csv")
df
10
550064 3 0 20
550065 4+ 1 20
550066 2 0 20
550067 4+ 1 20
df.index
[139]: df.describe()
# this function give u all the details about aggregate method
# like max, count, sum, avg etc
11
Product_Category_2 Product_Category_3 Purchase
count 376430.000000 166821.000000 550068.000000
mean 9.842329 12.668243 9263.968713
std 5.086590 4.125338 5023.065394
min 2.000000 3.000000 12.000000
25% 5.000000 9.000000 5823.000000
50% 9.000000 14.000000 8047.000000
75% 15.000000 16.000000 12054.000000
max 18.000000 18.000000 23961.000000
df[9:14]
12
13 1000005 P00145042 M 26-35 20.0 A
18 1000007 P00036842 M 36-45 1.0 B
19 1000008 P00249542 M 26-35 12.0 C
24 1000008 P00303442 M 26-35 12.0 C
13 Sort
We can use the sort_index() method to sort the object by labels.
13
Stay_In_Current_City_Years Marital_Status Product_Category_1 \
550067 4+ 1 20
550066 2 0 20
550065 4+ 1 20
550064 3 0 20
550063 1 1 20
… … … …
4 4+ 0 8
3 2 0 12
2 2 0 12
1 2 0 1
0 2 0 3
14
14 5378 1 1000006
14 Sort by values
Pandas provides sort_values() method to sort by values.
It accepts a by argument which will use the column name of the
DataFrame with which the values are to be sorted.
[560]: # Lets try to sort dataset using Purchase column
df.sort_values(by=['Purchase']).head()
df.sort_values(by=['Age','Purchase']).head(20)
15
405177 1002288 P00173042 M 0-17 10 B
279302 1001084 P00164042 M 0-17 19 C
336891 1003843 P00003442 F 0-17 10 B
302525 1004541 P00003442 M 0-17 10 B
482756 1002288 P00030942 M 0-17 10 B
281904 1001434 P00003442 F 0-17 10 A
116604 1006006 P00053842 F 0-17 0 C
229649 1005420 P00187342 F 0-17 19 B
515747 1001434 P0096442 F 0-17 10 A
396449 1001054 P00053842 M 0-17 10 C
329590 1002810 P00187342 F 0-17 10 B
240346 1001096 P00173042 M 0-17 10 C
16
302525 5.0 8.0 700
482756 5.0 9.0 706
281904 5.0 8.0 713
116604 5.0 12.0 718
229649 5.0 15.0 731
515747 5.0 12.0 740
396449 5.0 12.0 745
329590 5.0 15.0 747
240346 15.0 16.0 748
17
187688 2.0 8.0 19497
187689 2.0 15.0 15737
490144 14.0 16.0 15596
370225 8.0 14.0 7119
15 loc[]
The loc property used to gets, or sets, the value(s) of the specified labels.
Syntax—- Loc[index no, ”column name]
# Lets try to print all columns for 2nd , 3rd and 4th rows
df.loc[2:4]
User_ID Product_ID
3 1000001 P00085442
5 1000003 P00193542
16 iloc[]
it is used to get or set the data of a particular cell
syntax : df.iloc[row index ,column index]
18
#print last row using iloc
df.iloc[-1]
19
[550068 rows x 13 columns]
17 Drop
drop() - Drops the specified rows/columns from the DataFrame
syntax: df.drop[column name , axis=1]
axis=0 belongs to rows & axis =1 belongs to columns
[454]: df =pd.read_csv("C:\\Users\\sanram\\Videos\\Jupyter python practise\\pandas␣
↪case study\\blackfriday.csv")
df.head(4)
20
4 4+ 0 8
Purchase
0 8370
1 15200
2 1422
3 1057
4 7969
[447]: A B C
0 1 4 1
1 2 5 3
21
6 1000004 P00184942 M 46-50 7 B
7 1000004 P00346142 M 46-50 7 B
18 Check duplicate
The duplicated() method returns a Boolean values for each row
[719]: data = {
"name": ["Sally", "Mary", "John", "Mary"],
"age": [50, 40, 30, 40],
"qualified": [True, False, False, False]
}
df = pd.DataFrame(data)
[719]: 0 False
1 False
2 False
3 True
dtype: bool
22
[724]: #if you want to check the duplicate in particular column then:
df[df.duplicated("qualified")]
19 drop Duplicates
[181]: data = {
"name": ["Sally", "Mary", "John", "Mary"],
"age": [50, 40, 30, 40],
"qualified": [True, False, False, False]
}
df = pd.DataFrame(data)
newdf = df.drop_duplicates()
newdf
20 dropna
dropna()- The dropna() method removes the rows that contains NULL values.
The dropna() method returns a new DataFrame object unless the inplace parameter is set to True,
df
23
550067 1006039 P00371644 F 46-50 0 B
24
Stay_In_Current_City_Years Marital_Status Product_Category_1 \
1 2 0 1
6 2 1 1
13 1 1 1
14 1 0 5
16 1 0 2
… … … …
545902 4+ 1 3
545904 2 0 6
545907 2 0 2
545908 2 0 1
545914 2 0 1
25
13 2.0 5.0 15665
14 8.0 14.0 5378
[194]: #dropna(inplace = True) it will remove all rows containing NULL values from
# the original DataFrame and create a new dataframe
df.dropna(inplace=True)
df
26
545914 2.0 11.0 11640
21 Fillna
The fillna() method replaces the NULL values with a specified value.
[215]: name id
0 sd NaN
1 NaN 987.0
2 fg 23.0
3 NaN NaN
4 gh 52.0
[313]: # if you want to fill all the missing values from a single variable
df.fillna(222222)
[313]: name id
0 sd 222222.0
1 222222 987.0
2 fg 23.0
3 222222 222222.0
4 gh 52.0
[316]: #If you want to fill the particular column null values
#with particular data then you have to use dictionary
df.fillna({'name':'santosh','id': 34})
[316]: name id
0 sd NaN
1 santosh 987.0
2 fg 23.0
3 santosh NaN
4 gh 52.0
[314]: # bfill() method replaces the NULL values with the value from the forward row
df.fillna(method ="bfill")
[314]: name id
0 sd 987.0
1 fg 987.0
27
2 fg 23.0
3 gh 52.0
4 gh 52.0
[315]: # ffill() method replaces the NULL values with the value from the previous row
df.fillna(method ="ffill")
[315]: name id
0 sd NaN
1 sd 987.0
2 fg 23.0
3 fg 23.0
4 gh 52.0
22 Apply method
The apply() method allows you to apply a function along one of the axis of the DataFrame, default
0, which is the index (row) axis.
Syntax : dataframe.apply(func, axis, raw, result_type, args, kwds)
[287]: data = {
"ID": [50, 40, 30],
"EMPID": [300, 1112, 42]
}
df=pd.DataFrame(data)
print("orginal dataframe \n",df)
print()
def fun(y):
if y>1000:
return "NEW EMP"
else:
return "OLD EMP"
orginal dataframe
ID EMPID
0 50 300
1 40 1112
2 30 42
28
[287]: ID EMPID type
0 50 300 OLD EMP
1 40 1112 NEW EMP
2 30 42 OLD EMP
23 Replace
Replace() - method replaces the specified value with another specified value.
[727]: df =pd.DataFrame({'name':['sd',np.nan,'d',np.nan,'df'],'id':[1,np.nan,234,4,np.
↪nan]})
print(df)
name id
0 sd 1.0
1 NaN NaN
2 d 234.0
3 NaN 4.0
4 df NaN
[727]: name id
0 ram 1.0
1 NaN NaN
2 d 234.0
3 NaN 4.0
4 df NaN
[297]: name id
0 99 1.0
1 NaN NaN
2 99 234.0
3 NaN 4.0
4 df NaN
[302]: name id
0 santoshsantosh 1.0
1 NaN NaN
29
2 santosh 234.0
3 NaN 4.0
4 santoshsantosh NaN
[303]: name id
0 22222.0 1.0
1 NaN NaN
2 22222.0 234.0
3 NaN 4.0
4 22222.0 NaN
[742]: name id
0 sd 1.000000
1 NaN 79.666667
2 d 234.000000
3 NaN 4.000000
4 df 79.666667
[741]: name id
0 sd 1.000000
1 NaN 79.666667
2 d 234.000000
3 NaN 4.000000
4 df 79.666667
25 Interpolate
The interpolate() method replaces the NULL values based on a specified method.
It will fill all the interger value column date but not fill any string values.
[308]: df=pd.read_csv("C:\\Users\\sanram\\Videos\\Jupyter python practise\\test.csv")
df
30
[308]: name id
0 sd NaN
1 NaN 987.0
2 fg 23.0
3 NaN NaN
4 gh 52.0
[318]: df.interpolate()
[318]: name id
0 sd 987.0
1 fg 987.0
2 fg 23.0
3 gh 52.0
4 gh 52.0
[658]: df.interpolate(method='ffill')
[658]: A B
0 1.0 NaN
1 1.0 1.0
2 2.0 23.0
3 2.0 3.0
4 3.0 3.0
26 Merge
The merge() method updates the content of two DataFrame by merging them together, using the
specified method(s).
Its necessary u have common column to merge the data
[345]: df =pd.DataFrame({ "car":['MB','BMW','TATA','MS'],
"Model-1":[2012,2023,2012,2014]})
31
how – attributes helps us to specify how to merge the data
Like how =”left” same as left join
same are as “right”, “inner”, “cross”, “outer”
[339]: var =pd.merge(df,df1 ,how="right")
var
27 concat
concat() - The concat function is used to concatenate pandas objects
while concatenating on horizonatal both datframes has equal no of rows
while concatenating on vertical both datframes has equal no of columns
concat simply merge the data, it doesn’t merge on the basis of join
[342]: df =pd.DataFrame({ "car":['MB','BMW','TATA','MS'],
"Model-1":[2012,2023,2012,2014]})
32
[343]: car Model-1 car Model-2
0 MB 2012 MB 2013
1 BMW 2023 BMW 2013
2 TATA 2012 TATA 2022
3 MS 2014 BMW 2021
28 Join
The join() method inserts column(s) from another DataFrame, or Series.
var1 =[1,23,4,56,34]
var2 =[4,56,78,9,89]
var3 =[1,23,]
var4 =[11,23]
df = pd.DataFrame({"A":var1,"B":var2})
df2 = pd.DataFrame({"C":var3, "D": var4})
#join on the bases of outer join u can use inner, left ,right etc
data =df.join(df2, how ="outer")
data
[384]: A B C D
0 1 4 1.0 11.0
1 23 56 23.0 23.0
2 4 78 NaN NaN
3 56 9 NaN NaN
4 34 89 NaN NaN
29 append()
method appends a DataFrame-like object at the end of the current DataFrame.
[385]: var1 =[1,23,4,56,34]
var2 =[4,56,78,9,89]
var3 =[1,23,]
var4 =[11,23]
df1 = pd.DataFrame({"A":var1,"B":var2})
df2 = pd.DataFrame({"C":var3, "D": var4})
33
newdf = df1.append(df2)
newdf
C:\Users\sanram\AppData\Local\Temp\ipykernel_16444\1789055825.py:10:
FutureWarning: The frame.append method is deprecated and will be removed from
pandas in a future version. Use pandas.concat instead.
newdf = df1.append(df2)
[385]: A B C D
0 1.0 4.0 NaN NaN
1 23.0 56.0 NaN NaN
2 4.0 78.0 NaN NaN
3 56.0 9.0 NaN NaN
4 34.0 89.0 NaN NaN
0 NaN NaN 1.0 11.0
1 NaN NaN 23.0 23.0
30 GroupBy
It groupby() is used to spilit the data into groups based on some criteria
groupby method is used for grouping the data according to the categories & apply aggregate
function to the categories like :max, min , sum , avg etc
[572]: var1= ['sd','fd','kj','sr','ram','sd']
var2=[12,34,45,1,99,67]
var3 = ['maths','physics','chem','economics','bio','physics']
34
[580]: #group by on a particular column
va1r =df.groupby("name").groups
va1r
[580]: {'fd': [1], 'kj': [2], 'ram': [4], 'sd': [0, 5], 'sr': [3]}
31 The melt()
method reshapes the DataFrame into a long table with one row for each each column.
[386]: day_var =[1,2,3,4,5]
eng_marks =[30,65,45,678,91]
maths_marks =[55,34,56,70,66]
df =pd.DataFrame({"day":day_var,"english":eng_marks,"maths":maths_marks})
df
35
6 english 65
7 english 45
8 english 678
9 english 91
10 maths 55
11 maths 34
12 maths 56
13 maths 70
14 maths 66
32 Pivot
its helps us to reshape the dataframe
[390]: day_var =[1,2,3,4,5]
eng_marks =[30,65,45,678,91]
maths_marks =[55,34,56,70,66]
st_name =['sd','df','ram','sd','df']
df =pd.DataFrame({"day":day_var,"Stu_name":st_name, "english":eng_marks,"maths":
↪maths_marks})
df
[391]: df.pivot(index="day",columns="Stu_name")
33 Date Range
pandas daterange is useful in creating range of times or date
it is mainely used in reindexing our datetime index.
Syntax : pd.date_range(start_time , end_time)
36
B -business day frequency
C -custom business day frequency
D -calendar day frequency
W -weekly frequency
M -month end frequency
SM -semi-month end frequency (15th and end of month)
BM -business month end frequency
H -hourly frequency
T / min - minutely frequency
S -secondly frequency
ms -milliseconds
3H - is 3 hour difference
[435]: df = pd.date_range(start ='2020-01-01' , end='2020-01-02 ', freq ='4T')
df
[429]: print(type(df))
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
date_periods
37
[439]: DatetimeIndex(['2023-01-02 00:00:00', '2023-01-02 00:02:40',
'2023-01-02 00:05:20', '2023-01-02 00:08:00',
'2023-01-02 00:10:40', '2023-01-02 00:13:20',
'2023-01-02 00:16:00', '2023-01-02 00:18:40',
'2023-01-02 00:21:20', '2023-01-02 00:24:00'],
dtype='datetime64[ns]', freq=None)
34 Count method()
Count the number of (not NULL) values in each row:
[ ]: data = pd.DataFrame({
'name': ['sravan', 'ojsawi', 'bobby', 'rohith',
'gnanesh', 'sravan', 'sravan', 'ojaswi'],
'subjects': ['java', 'php', 'java', 'php', 'java',
'html/css', 'python', 'R'],
'marks': [98, 90, 78, 91, 87, 78, 89, 90],
'age': [11, 23, 23, 21, 21, 21, 23, 21]
})
35 isna() / isnull()
It helps us to detect NA vallues
[473]: data = pd.DataFrame({
'name': ['sravan', 'ojsawi', 'bobby', 'rohith',
'gnanesh', 'sravan', 'sravan', 'ojaswi'],
'subjects': ['java', 'php', 'java', 'php', 'java',
'html/css', 'python', 'R'],
'marks': [98, 90, 78, 91, 87, 78, 89, 90],
'age': [11, 23, 23, 21, 21, 21, 23, 21]
})
38
print(data.isna()) #or #print(data.isnull()) both are same
df.isnull()
39
3 False False False
4 False False False
… … … …
550063 False False False
550064 False False False
550065 False False False
550066 False False False
550067 False False False
[481]: df.isnull().sum()
[481]: User_ID 0
Product_ID 0
Gender 0
Age 0
Occupation 0
City_Category 0
Stay_In_Current_City_Years 0
Marital_Status 0
Product_Category_1 0
Product_Category_2 173638
Product_Category_3 383247
Purchase 0
dtype: int64
[495]: df.dropna(inplace=True)
df
40
14 1000006 P00231342 F 51-55 9 A
16 1000006 P0096642 F 51-55 9 A
… … … … … … …
545902 1006039 P00064042 F 46-50 0 B
545904 1006040 P00081142 M 26-35 6 B
545907 1006040 P00277642 M 26-35 6 B
545908 1006040 P00127642 M 26-35 6 B
545914 1006040 P00217442 M 26-35 6 B
[508]: df.isnull().sum()
[508]: User_ID 0
Product_ID 0
Gender 0
Age 0
Occupation 0
City_Category 0
Stay_In_Current_City_Years 0
Marital_Status 0
41
Product_Category_1 0
Product_Category_2 0
Product_Category_3 0
Purchase 0
dtype: int64
36 Filter
it helps us to Access group of rows and columns by same matched
[510]: #filter in columns
df.filter(like ="Product", axis=1)
42
545099 16.0 17.0 8093
545299 6.0 8.0 15256
545399 13.0 14.0 6950
545499 9.0 12.0 1392
37 Copy()
The copy() method returns a copy of the DataFrame. By default, the copy is a “deep copy” meaning
that
any changes made in the original DataFrame will NOT be reflected in the copy.
[542]: df1 = df.copy()
df1
[ ]: #shallow copy
In shallow copy if we changes this in the new dataframe
then changes get reflected in the orginal dataframe
43
19 1000008 P00249542 M 26-35 12.0 C
24 1000008 P00303442 M 26-35 12.0 C
38 Data Cleaning
[582]: df =pd.read_csv("C:\\Users\\sanram\\Videos\\Jupyter python practise\\pandas␣
↪case study\\blackfriday.csv")
df
44
550066 2 0 20
550067 4+ 1 20
45
432173 1000543 P00205642 M 26-35 5 B
389912 1006004 P00184242 F 26-35 15 C
172140 1002624 P00182242 F 36-45 0 A
218182 1003661 P00113342 M 36-45 12 C
101932 1003752 P00220342 F 18-25 1 B
438737 1001545 P00178642 M 26-35 20 A
534915 1004351 P00340642 M 26-35 12 C
549347 1004992 P00371644 F 26-35 2 B
[586]: df.isnull().sum()
[586]: User_ID 0
Product_ID 0
Gender 0
Age 0
Occupation 0
City_Category 0
Stay_In_Current_City_Years 0
Marital_Status 0
Product_Category_1 0
Product_Category_2 173638
Product_Category_3 383247
Purchase 0
46
dtype: int64
[591]: ['User_ID',
'Product_ID',
'Gender',
'Age',
'Occupation',
'City_Category',
'Stay_In_Current_City_Years',
'Marital_Status',
'Product_Category_1',
'Product_Category_2',
'Product_Category_3',
'Purchase']
[609]: User_ID 0
Product_ID 0
Gender 0
Age 0
Occupation 69638
City_Category 0
Stay_In_Current_City_Years 0
Marital_Status 324731
Product_Category_1 0
Product_Category_2 173638
Product_Category_3 383247
Purchase 0
dtype: int64
df[df_list[0:12]] = df[df_list[0:12]].replace(0,np.nan)
df
47
550063 1006033 P00372445 M 51-55 13.0 B
550064 1006035 P00375436 F 26-35 1.0 C
550065 1006036 P00375436 F 26-35 15.0 B
550066 1006038 P00375436 F 55+ 1.0 C
550067 1006039 P00371644 F 46-50 NaN B
[621]: df
48
19 1000008 P00249542 M 26-35 12.0 C
24 1000008 P00303442 M 26-35 12.0 C
… … … … … … …
545885 1006036 P00207342 F 26-35 15.0 B
545887 1006036 P00127742 F 26-35 15.0 B
545888 1006036 P00196042 F 26-35 15.0 B
545889 1006036 P00129342 F 26-35 15.0 B
545890 1006036 P00244142 F 26-35 15.0 B
[622]: df.isnull().mean()
49
Product_Category_1 0.0
Product_Category_2 0.0
Product_Category_3 0.0
Purchase 0.0
dtype: float64
df
50
550067 NaN NaN 490
[0, 1]
df["Marital_Status"] = df["Marital_Status"].apply(a)
[710]: df
51
1 6.0 14.0 15200
2 NaN NaN 1422
3 14.0 NaN 1057
4 NaN NaN 7969
… … … …
550063 NaN NaN 368
550064 NaN NaN 371
550065 NaN NaN 137
550066 NaN NaN 365
550067 NaN NaN 490
C:\Users\sanram\Anaconda3\lib\site-packages\seaborn\_decorators.py:36:
FutureWarning: Pass the following variable as a keyword arg: x. From version
0.12, the only valid positional argument will be `data`, and passing other
arguments without an explicit keyword will result in an error or
misinterpretation.
warnings.warn(
52
39 Pandas - Data Correlations
The corr() method calculates the relationship between each column in your data set.
[6]: df.corr()
40 astype() Method
The astype() method returns a new DataFrame where
the data types has been changed to the specified type.
[8]: #converting int datatype of float
df.astype(dtype='float64')
53
3 45.0 109.0 175.0 282.4
4 45.0 117.0 148.0 406.0
.. … … … …
164 60.0 105.0 140.0 290.8
165 60.0 110.0 145.0 300.0
166 60.0 115.0 145.0 310.2
167 75.0 120.0 150.0 320.4
168 75.0 125.0 150.0 330.4
[ ]:
54