0% found this document useful (0 votes)
78 views10 pages

Dataframes-I (Create & Selection)

Pandas dataframes allow for storing and manipulating tabular data. Dataframes can be created from lists, dictionaries, and other structures. They have rows and columns to organize data similar to tables. Operations can then be performed on the data, such as selecting specific rows or columns using labels, indexes, or slices. Both row and column selection is supported to access parts of the dataframe.

Uploaded by

Ayush Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views10 pages

Dataframes-I (Create & Selection)

Pandas dataframes allow for storing and manipulating tabular data. Dataframes can be created from lists, dictionaries, and other structures. They have rows and columns to organize data similar to tables. Operations can then be performed on the data, such as selecting specific rows or columns using labels, indexes, or slices. Both row and column selection is supported to access parts of the dataframe.

Uploaded by

Ayush Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

PANDAS DATA FRAME

It is a two dimensional object that is used to represent data in rows and columns. It is similar
to our mysql tables. Once we store data in this format, we can perform various operations
that are useful in analyzing and understanding the data. It can contain heterogeneous data.
The size and data of a dataframe are mutable ie. they can change.
Note: A dataframe can contain only one column of data also:
Column
Admno Name Class Section Marks names
100 Sushmita 12 A 78
101 Sarika 12 A 84
102 Aman 12 B 90
103 Kartavya 12 C 70 Row

Column
Dataframe has row and column index
A dataframe can be created using any of the following:
1. Lists
2. Dictionary
3. Numpy 2D array
4. Series

CREATION OF DATAFRAMES
Ways to create a dataframe:
a) Creating an empty dataframe
import pandas as pd
df=pd.DataFrame()
print(df)

Output:
Empty Dataframe
Columns: [ ]
Index: [ ]
b) Creating a dataframe using list
Example 1: import pandas as pd
d=[12,13,14,15,16]
df=pd.DataFrame(d)
print(df)
Output:
0 Default column name
0 12
1 13
2 14
3 15
4 16

Example 2: (using sublists)


import pandas as pd
data=[[‘ aman’, 45],[ 'vishal', 56],[ 'soniya', 67]]
df=pd.DataFrame(data,columns=[‘name’,’age’])
print(df)

Output:
name age
0 aman 45
1 vishal 56
2 soniya 67

Example 3: (using dtype)


import pandas as pd
data=[[‘aman’, 45],[ 'vishal', 56],[ 'soniya', 67]]
df=pd.DataFrame(data,columns=[‘name’,’age’], dtype=float)
print(df)

Output:
name age
0 aman 45.0
1 vishal 56.0
2 soniya 67.0

c) Creating a dataframe using dictionary


import pandas as pd
dict1={ 'aman':45,'vishal':56, 'soniya':67,'parth':78}
df=pd.DataFrame(dict1)
print(df)
WILL RESULT IN ERROR………..
Creating dataframe from dictionary of series

Example 1. (with default index)


import pandas as pd
dict1={'names':['aman','vishal','soniya','parth'],'marks':[45,56,67,78]}
df=pd.DataFrame(dict1)
print(df)
Output:
names marks
0 aman 45
1 vishal 56
2 soniya 67
3 parth 78

Example 2. (with specific index)


import pandas as pd
dict1={'names':['aman','vishal','soniya','parth'],'marks':[45,56,67,78]}
df=pd.DataFrame(dict1, index=[100,101,102,103])
print(df)
Output:
names marks
100 aman 45
101 vishal 56
102 soniya 67
103 parth 78

 Creating dataframe from list of dictionary


import pandas as pd
list1=[{'name':'sushmita', 'surname':'Ghosh'},
{'name':'lakshay', 'surname':'Mehta'},
{'name':'Amir', 'surname':'khan'},
{'name':'Kapil', 'surname':'Dev'}]
df=pd.DataFrame(list1)
print(df)
Output:
name surname
0 sushmita Ghosh
1 lakshay Mehta
2 Amir khan
3 Kapil Dev
Consider the following code to create a dataframe named df, which will be
used as a reference for all the operations on dataframe done below
import pandas as pd
dict1={'names':['aman','vishal','soniya','parth','sushant','Umang'] ,
'marks':[45,56,67,78,80,89] ,
's_class':[11,12,12,12,10,10] ,
'sec':['a','a','e','d','c','d']
}
df=pd.DataFrame(dict1, index=[100,101,102,103,104,105])
print(df)

Output:
names marks s_class sec
100 aman 45 11 a
101 vishal 56 12 a
102 soniya 67 12 e
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d
SELECTION OF DATA FROM A DATAFRAME
I. ROWS SELECTION
a) Selection by label: Rows can be selected by passing row label to
a .loc function.
Example 1: selecting single row label
>>> print(df.loc[101])
Output:
names vishal
marks 56
s_class 12
sec a
Name: 101, dtype: object

Example 2: selecting multiple row labels


>>> print(df.loc[[103,104,105]])

Output:
names marks s_class sec
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d

b) Selection by integer location


Rows can be selected by passing integer location to an iloc function.
Example 1: Selecting single row index
>>> print(df.iloc[2])
Output:
names soniya
marks 67
s_class 12
sec e
Name: 102, dtype: object

Example 2: Selecting multiple row index


>>> print(df.iloc[[2,4,5]])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
105 Umang 89 10 d

c) Slice Rows
Multiple rows can be selected using ‘ : ’ operator.
Example 1:
>>> print(df[2:4])
Output:
names marks s_class sec
102 soniya 67 12 e
103 parth 78 12 d

Example 2: ( use of step value)


>>> print(df[2:6:2])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c
Example 3:Multiple rows can also be selected by using iloc()
>>> print(df.iloc[2:6:2])
Output:
names marks s_class sec
102 soniya 67 12 e
104 sushant 80 10 c

d) head and tail ()


head() returns the first n rows (observe the index values). The default
number of elements to display is five, but you may pass a custom number.
Example
>>> print(df.head(3))
Output
names marks s_class sec
100 aman 45 11 a
101 vishal 56 12 a
102 soniya 67 12 e

tail() returns the last n rows (observe the index values). The default
number of elements to display is five, but you may pass a custom number.
>>> print(df.tail(4))
Output
names marks s_class sec
102 soniya 67 12 e
103 parth 78 12 d
104 sushant 80 10 c
105 Umang 89 10 d
II. COLUMN SELECTION:
a) To display the contents of a particular column from the
DataFrame we write:
df [‘col name’])
OR
df.colname

Example 1:
print(df[‘names’])
Output:
100 aman
101 vishal
102 soniya
103 parth
Name: names, dtype: object

Example 2:
print(df.sec)
Output:
100 a
101 a
102 e
103 d
104 c
105 d
b) To access multiple columns we can write as:
df[ [‘col1’,’col2’,---] ]
Example 2:
print(df[['marks','sec']])
Output:
marks sec
100 45 a
101 56 a
102 67 e
103 78 d
104 80 c
105 89 d

III. SELECTING ROWS AND COLUMNS SIMULTANEOUSLY


USING .LOC
Example:
>>> print(df.loc[[101,102],['names','sec']])
Output:
names sec
101 vishal a
102 soniya e

You might also like