dataframe
dataframe
DataFrames
1. DataFrame:
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows
and columns.
In DataFrame both data and size is mutable.
2. Features of DataFrame
Potentially columns are of different types
Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns
3. Structure of DataFrame
Let us assume we have a DataFrame called „student‟ whose structure is follows.
DataFrameName=pandas.DataFrame(data,index,columns,dtype)
Where,
‘data’ takes various forms like ndarray, series, lists, dict, constants and also another
DataFrame.
‘Index’ represents row labels.
‘columns’ represents column ``labels.
‘dtype’ represents data type of values.
DataFrame
12
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
DataFrame mainly created by using the following 4 ways.
(a) DataFrame creation using list of list
(b) DataFrame creation using list of dictionary
(c) DataFrame creation using dictionary of list
(d) DataFrame creation using dictionary of Series
Note: as per our syllabus , we have only (b) and (d)
DataFrame 13
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
(c) DataFrame creation using dictionary of list
5. DataFrame Operations
We can perform the following operations on DataFrame.
(a) Obtaining single column.
(b) Obtaining multiple columns.
(c) Obtaining single row.
(d) Obtaining multiple rows.
(e) Obtaining sequence of rows.
(f) Obtaining specific rows and columns.
(g) Adding a column.
(h) Modifying a column.
(i) Deleting a column.
(j) Adding a row.
(k) Modifying a row.
(l) Deleting a row.
(m) Accessing individual data item.
(n) Changing individual data item.
DataFrame 14
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg (1): Write a Python statement to select a column called „Pname‟ from df1 DataFrame.
Ans:
DataFrameName[[columnname1,colname2,…]]
Eg (2): Write a Python statement to select a column called „Pname‟ and „Pno‟ from df1
DataFrame.
Ans:
DataFrameName.Ioc[rowname]
(or)
DataFrameName.iIoc[rowindex]
DataFrame 15
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg(1): Write a Python statement to select a row whose label is r1 from the following df1.
Ans:
Eg(2): Write a Python statement to select a row whose label index is 0 from the above df1.
Ans:
Eg1: Write a Python statement to display „r1‟ and „r3‟ rows from above DataFrame df.
Ans:
DataFrameName.Ioc[stratingrowname:endingrowname]
(or)
DataFrame.iloc[strartingintindex:endingintindex]
DataFrame 16
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg(1): Write a python code to siplay 2nd , 3rd and 4th rows from DataFrame df
Ans-1:
Ans-2:
DataFrameName.Ioc[stratingrowname:endingrowname,startingcolname:endingcolumnname]
(or)
DataFrame.iloc[rowstrartingintindex:rowendingintindex,columnstartingintindex:columnendingintindex]
Eg: Write a python code to display the following output from above DataFrame df.
Ans:
Eg(1): Write a python code to add a new column called qty with [15,16,17,18] as values to the above
DataFrame df.
Ans:
DataFrame 17
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg (2): In this example, we have created a DataFrame from dictionary initially, and then we added a new
column called „address‟ to the DataFrame „DF2‟.
import pandas as pd
importnumpyasnp
d1={'name':['ram','peter','Faiz'],'Rno':[1,2,3],'phno':[123,456,np.NaN]}
DF2=pd.DataFrame(d1)
print(DF2)
DF2['address']='jpnagar' Here, we added new column ‘address’ to DF2.
print('after adding a column the DF2 dataFrame is as follows')
print(DF2)
Output:
name Rno phno
0 ram 1 123.0
1 peter 2 456.0
2 Faiz 3 NaN
Note: In above example, axis=1 represents column. Actually default axis values is 0(zero)
which represents row,
(j) Adding a row.
loc method is used to add/modify a row in DataFrame.
Eg(1): Write a python code to add new row with ‘r5’ as row label and
[„p5‟,‟Notebook‟,7.00] as values.
Ans:
Eg(2):In the following example, „hi‟ is common for all the columns in that row.
DataFrame 19
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
DataFrameName.drop(„rowindex‟,inplace=True)
(or)
DataFrameName.drop(„rowindex‟,axis=0,inplace=True)
DataFrameName.coIumnname[rowname]
(or)
DataFrameName.at[rowname,columnname]
(or)
DataFrameName.iat[rowindex,columnindex]
(or)
DataFrameName.loc[rowname,columnname]
(or)
DataFrameName.iloc[rowindex,columnindex]
DataFrame 20
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg: Write a python code to display only shampoo from the following DataFrame df.
Ans:
DataFrameName.at[rowname,columnname]
=newvaIue (or)
DataFrameName.iat[rowindex,columnindex] =newvaIue
(or)
DataFrameName.loc[rowname,columnname] =newvaIue
(or)
DataFrameName.iloc[rowindex,columnindex] =newvaIue
Ans:
DataFrame 21
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg (1):
6.b. iteritems()
Iterates over each column as key, value pair with label as key and column value as a Series object.
The iteritems() will return output in the form of (column_index, Series).
DataFrame 22
INFORMATICS PRACTICES (NEW) - XI BY G SIVA PRASAD (PGT)
Eg (1):
7. DataFrame Attributes:
Some common attributes of DataFrame objects are:
Attribute Meaning
DataFrame.index The index of the DataFrame
DataFrame.columns Column labels of the DataFrame
DataFrame.axes Returns both indexes and column names
DataFrame.dtype Return data type of data
DataFrame.shape Return a tuple of the shape
DataFrame.nbytes Return the number of bytes occupied by data
DataFrame.ndim Return the number of dimensions
DataFrame.size Return the number of elements
DataFrame.hasnans Return DataFrame if Series has NaN values, otherwise False
DataFrame.empty Return DataFrame if Series object is empty, otherwise False
DataFrame 23
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg (1): Consider the following DataFrame df.
24
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg(1):
Eg(2):
Eg(3):
Eg(4):
Note: In above eg(4) , except last three remaining all rows will come as an output.
tail(n) function is used to get last “n‟ rows from DataFrame.
tail(-n) function will return all rows except n rows from beginning.
If we don‟t supply parameter to tail() function, then it will return last 5 rows from DataFrame.
25
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg(1):
Eg(2):
Eg(3):
9) Boolean Indexing:
Setting boolean values(True/False/1/0) as indexes in DataFrame is called boolean indexing.
Boolean indexing is defined as a very important feature of numpy, which is frequently used in
pandas.
Its main task is to use the actual values of the data in the DataFrame.
We can filter the data in the boolean indexing in different ways, which are as follows:
Accessing a DataFrame with a boolean index
Applying a boolean mask to a DataFrame
Masking data based on column value
Masking data based on an index value
26
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg:
Output:
dfname.rename(index={oldindex:newindex,…..},columns={oldname:newname,…..},inplace=True)
Eg: Consider the following DataFrame df and rename row index r1 as row1 and column label pname
as prodname
Ans:
27
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
11) DATA TRANSFER BETWEEN CSV FILES AND DATAFRAME OBJECTS
1. Transferring data from .csv files to DataFrames
The acronym for CSV is Comma Separated Values.
CSV is asimple fileformat used tostoretabular data, suchas aspreadsheet or database.
Files in the CSV format can be imported to and exported from programs that store data in tables,
such as Microsoft Excel or OpenOffice Calc.
Advantages of .csv files
• Simple and compact for data storage.
• A common format for data interchange.
• It can open in spreadsheets.
By using read_csv() method we can read and convert into DataFrame.
Syntax to transfer .csv file to DataFrame
DataFrame_Name=pandas.read_csv(“path\\filename.csv”)
Eg(1): Write a python code to convert “simple.csv” which is in “D” drive “IP” folder into a DataFrame
called “DF1”.
DataFrame_Name.to_csv(“path\\filename.csv”)
The DataFrame will be converted and stored in the specified path with given name.
28
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg:
SAMPLE QUESTIONS
1. Given a DataFrame DF1 shown below.
import pandas as pd
d1={'name':['ram','peter'],'empno':[1,2],'sal':[100,200]}
DF2=pd.DataFrame(d1,index=['emp1','emp2'])
29
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
a) print(DF2)
b) print(DF2.loc['emp1','name'])
c) print(DF2.iloc[0,0])
d) print(DF2.name['emp1'])
e) What you have observed from QNos (b) to (d)?
f) print(DF2['sal']>150)
g) print(DF2.iloc[0:1,0:1])
h) print(DF2.loc['emp1':'emp2','name':'empno'])
5. Write a python code to create the following DataFrame called „DF3‟.
(a) print(DF1+DF2)
(b) print(DF1.add(DF2))
(c) print(DF1.sub(DF2))
(d) print(DF1.rsub(DF2))
(e) print(DF1.mul(DF2))
(f) print(DF1.div(DF2))
30
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
12. Write a Python code to create a DataFrame with appropriate column headings from the list given
below:
[[1001,'IND-AUS',‟2022-10-17‟], [1002,'IND-PAK',‟2022-10-23‟], [1003,'IND-SA' , „2022-10-
30], [1004,'IND-NZ',‟2022-11-18‟]]
Write commands to :
i. Add a new column ‘Stream’ to the Dataframe with values (Science, Commerce, Arts, Science.)
ii. Add a new row with values ( 5 , Mridula ,X, F , 9.8, Science)
31
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
16. Consider the following DataFrame “HOSPITAL”
19. In Pandas which of the following dataframe attribute can be used to know the number of rows and
columns in a
dataframe
a. size b. index c. count d. shape
20. Carefully observe the following code:
import pandas as pd
L=[['S101','Anushree',65],['S102','Anubha',56],['S104','Vishnu',67],['S105','Kritika',45]]
df=pd.DataFrame (L, columns=['ID','Name','Marks'])
print(df)
i. What is the shape of the data frame df?
ii. Name the index and column names of dataframe df
32
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
21. Write a Python code to create a DataFrame „Df‟ using dictionary of lists for the following data.
23. Mr. Kapoor, a data analyst has designed the dataframe DF that contains data about Attendance and
number of
classes of a week as shown below. Answer the following questions:
33
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
27. sAssertion (A): DataFrame has both a row and column index.
Reasoning (R): .loc() is a label based data selecting method to select a specific row(s) or column(s) which
we want to select.
28. Assertion (A): - When DataFrame is created by using Dictionary, keys of dictionary are set as
columns of DataFrame.
Reasoning (R):- Boolean Indexing helps us to select the data from the DataFrames using a boolean vector
29. Assertion (A):- While creating a DataFrame with a nested or 2D dictionary, Python interprets the
outer dict keys as the columns and the inner keys as the row indices.
Reasoning (R):- A column can be deleted using remove command
.
34