0% found this document useful (0 votes)
22 views

dataframe

Uploaded by

limabeans
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

dataframe

Uploaded by

limabeans
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

INFORMATICS PRACTICES (065) - XII BY G SIVA PRASAD

DataFrames
1. DataFrame:
 A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows
and columns.
 In DataFrame both data and size is mutable.

2. Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns

3. Structure of DataFrame
Let us assume we have a DataFrame called „student‟ whose structure is follows.

4. Creating and displaying DataFrame


 Syntax to create DataFrame

DataFrameName=pandas.DataFrame(data,index,columns,dtype)
Where,
 ‘data’ takes various forms like ndarray, series, lists, dict, constants and also another
DataFrame.
 ‘Index’ represents row labels.
 ‘columns’ represents column ``labels.
 ‘dtype’ represents data type of values.

DataFrame
12
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
 DataFrame mainly created by using the following 4 ways.
(a) DataFrame creation using list of list
(b) DataFrame creation using list of dictionary
(c) DataFrame creation using dictionary of list
(d) DataFrame creation using dictionary of Series
Note: as per our syllabus , we have only (b) and (d)

Eg: Write a python code to create the following DataFrame df1.


Pno Pname Price
r1 p1 Soap 5.0
r2 p2 Shampoo 5.5
r3 p3 Pen 6.0
r4 p4 Pencil 6.5
(a) DataFrame creation using list of list

(b) DataFrame creation using list of dictionary

DataFrame 13
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
(c) DataFrame creation using dictionary of list

(d) DataFrame creation using dictionary of Series

5. DataFrame Operations
 We can perform the following operations on DataFrame.
(a) Obtaining single column.
(b) Obtaining multiple columns.
(c) Obtaining single row.
(d) Obtaining multiple rows.
(e) Obtaining sequence of rows.
(f) Obtaining specific rows and columns.
(g) Adding a column.
(h) Modifying a column.
(i) Deleting a column.
(j) Adding a row.
(k) Modifying a row.
(l) Deleting a row.
(m) Accessing individual data item.
(n) Changing individual data item.

DataFrame 14
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)

(a) Obtaining single column.


 Syntax to select a column from existing DataFrame
DataFrameName[columnname]
(or)
DataFrameName.columnname

 Eg (1): Write a Python statement to select a column called „Pname‟ from df1 DataFrame.
Ans:

(b) Obtaining multiple columns.

 Syntax to select a column from existing DataFrame

DataFrameName[[columnname1,colname2,…]]

 Eg (2): Write a Python statement to select a column called „Pname‟ and „Pno‟ from df1
DataFrame.
Ans:

(c) Obtaining single row.


 Syntax:

DataFrameName.Ioc[rowname]
(or)
DataFrameName.iIoc[rowindex]

 To get rows, we have to use either loc or iloc methods.


 loc will be used to get row(s) from DataFrame if row labels are string.
 iloc will be used to get row(s) from DataFrame if row labels are integers.
Note-1: loc method can be used to get rows from DataFrame if has only integer row labels.
Note-2: The single row result will be displayed in the form Series.

DataFrame 15
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg(1): Write a Python statement to select a row whose label is r1 from the following df1.

Ans:

Eg(2): Write a Python statement to select a row whose label index is 0 from the above df1.
Ans:

(d) Obtaining multiple rows.


 Syntax:
DataFrameName.Ioc[[rowname1,rowname2,….]]
(or)
DataFrameName.iIoc[[rowintindex1,rowintindex2,….]]

 Eg1: Write a Python statement to display „r1‟ and „r3‟ rows from above DataFrame df.
Ans:

(e) Obtaining sequence of rows.


 Syntax:

DataFrameName.Ioc[stratingrowname:endingrowname]
(or)
DataFrame.iloc[strartingintindex:endingintindex]

DataFrame 16
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg(1): Write a python code to siplay 2nd , 3rd and 4th rows from DataFrame df
Ans-1:

Ans-2:

(f) Obtaining specific rows and columns.


 Syntax:

DataFrameName.Ioc[stratingrowname:endingrowname,startingcolname:endingcolumnname]
(or)
DataFrame.iloc[rowstrartingintindex:rowendingintindex,columnstartingintindex:columnendingintindex]

 Eg: Write a python code to display the following output from above DataFrame df.

Ans:

(g) Adding a column to DataFrame


 Syntax to add a column to existing DataFrame
DataFrameName[columnname]=Value

Eg(1): Write a python code to add a new column called qty with [15,16,17,18] as values to the above
DataFrame df.
Ans:

DataFrame 17
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)

Eg (2): In this example, we have created a DataFrame from dictionary initially, and then we added a new
column called „address‟ to the DataFrame „DF2‟.
import pandas as pd
importnumpyasnp
d1={'name':['ram','peter','Faiz'],'Rno':[1,2,3],'phno':[123,456,np.NaN]}
DF2=pd.DataFrame(d1)

print(DF2)
DF2['address']='jpnagar' Here, we added new column ‘address’ to DF2.
print('after adding a column the DF2 dataFrame is as follows')
print(DF2)

Output:
name Rno phno
0 ram 1 123.0
1 peter 2 456.0
2 Faiz 3 NaN

after adding a column the DF2 dataFrame is as follows

name Rno phno address


0 ram 1 123.0 jpnagar
1 peter 2 456.0 jpnagar
2 Faiz 3 NaN jpnagar

(h) Modifying a column.


Eg: Write python code to modify column Price with [7.00,7.50,8.00,8.50] values
Ans:

(i) Deleting a column from DataFrame


 Syntax to delete a column from existing DataFrame
del DataFrameName[columnname]
(or)
dfname.drop(columnname,axis=1,inplace=True)
DataFrame 18
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
Eg (1): Write a python statement to delete a column called „qty‟ from DataFrame „df‟.
Ans;

Note: In above example, axis=1 represents column. Actually default axis values is 0(zero)
which represents row,
(j) Adding a row.
 loc method is used to add/modify a row in DataFrame.
 Eg(1): Write a python code to add new row with ‘r5’ as row label and
[„p5‟,‟Notebook‟,7.00] as values.
Ans:

 Eg(2):In the following example, „hi‟ is common for all the columns in that row.

(k) Modifying a row.


Eg(1): In the following example, we modified the row whose index is „r6‟.

DataFrame 19
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)

(l) Deleting a row.


 Row(s) can be deleted by using drop() function.
 Syntax:

DataFrameName.drop(„rowindex‟,inplace=True)
(or)
DataFrameName.drop(„rowindex‟,axis=0,inplace=True)

 Write a python code to delete a row whose label is ‘r6’.


Ans:

(m) Accessing individual data item.


Syntax to select individual data from existing DataFrame

DataFrameName.coIumnname[rowname]
(or)
DataFrameName.at[rowname,columnname]
(or)
DataFrameName.iat[rowindex,columnindex]
(or)
DataFrameName.loc[rowname,columnname]
(or)
DataFrameName.iloc[rowindex,columnindex]

DataFrame 20
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)
 Eg: Write a python code to display only shampoo from the following DataFrame df.

Ans:

(n) Changing individual data item.


Syntax: to change individual data from existing DataFrame

DataFrameName.at[rowname,columnname]
=newvaIue (or)
DataFrameName.iat[rowindex,columnindex] =newvaIue
(or)
DataFrameName.loc[rowname,columnname] =newvaIue
(or)
DataFrameName.iloc[rowindex,columnindex] =newvaIue

Eg: Write a python code to change Shampoo with Conditioner.

Ans:

DataFrame 21
INFORMATICS PRACTICES (NEW)-XI BYG SIVA PRASAD
(PGT)

6) Iteratinq over a DataFrame


We have the following methods to iterate a DataFrame over column/rows . They are:
(a) iterrows()
(b) iteritems()
6.a. iterrows()
iterrows() returns the iterator yielding each index value along with a series containing the data in
each row.

 The iterrows() will return output in the form of (row_index, Series).

Eg (1):

6.b. iteritems()
 Iterates over each column as key, value pair with label as key and column value as a Series object.
 The iteritems() will return output in the form of (column_index, Series).

DataFrame 22
INFORMATICS PRACTICES (NEW) - XI BY G SIVA PRASAD (PGT)

Eg (4): This example, illustrates about mul() operation.

Eg (1):

7. DataFrame Attributes:
Some common attributes of DataFrame objects are:
Attribute Meaning
DataFrame.index The index of the DataFrame
DataFrame.columns Column labels of the DataFrame
DataFrame.axes Returns both indexes and column names
DataFrame.dtype Return data type of data
DataFrame.shape Return a tuple of the shape
DataFrame.nbytes Return the number of bytes occupied by data
DataFrame.ndim Return the number of dimensions
DataFrame.size Return the number of elements
DataFrame.hasnans Return DataFrame if Series has NaN values, otherwise False
DataFrame.empty Return DataFrame if Series object is empty, otherwise False

DataFrame 23
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg (1): Consider the following DataFrame df.

8) head() and tail() functions

 head(n) function is used to get first “n‟ rows from DataFrame.


 head(-n) function will return all rows except n rows from ending.
 If we don‟t supply parameter to head() function, then it will return first 5 rows from DataFrame.
 Consider the following DataFrame df.

24
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
 Eg(1):

 Eg(2):

 Eg(3):

 Eg(4):

Note: In above eg(4) , except last three remaining all rows will come as an output.
 tail(n) function is used to get last “n‟ rows from DataFrame.
 tail(-n) function will return all rows except n rows from beginning.
 If we don‟t supply parameter to tail() function, then it will return last 5 rows from DataFrame.

25
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
 Eg(1):

 Eg(2):

 Eg(3):

9) Boolean Indexing:
 Setting boolean values(True/False/1/0) as indexes in DataFrame is called boolean indexing.
 Boolean indexing is defined as a very important feature of numpy, which is frequently used in
pandas.
 Its main task is to use the actual values of the data in the DataFrame.
 We can filter the data in the boolean indexing in different ways, which are as follows:
 Accessing a DataFrame with a boolean index
 Applying a boolean mask to a DataFrame
 Masking data based on column value
 Masking data based on an index value

26
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)

Eg:

Output:

10) renaming row labels and column labels


 rename() function is used to rename either row labels or column labels;
 Syntax:

dfname.rename(index={oldindex:newindex,…..},columns={oldname:newname,…..},inplace=True)

 Eg: Consider the following DataFrame df and rename row index r1 as row1 and column label pname
as prodname

Ans:

27
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
11) DATA TRANSFER BETWEEN CSV FILES AND DATAFRAME OBJECTS
1. Transferring data from .csv files to DataFrames
 The acronym for CSV is Comma Separated Values.
 CSV is asimple fileformat used tostoretabular data, suchas aspreadsheet or database.
 Files in the CSV format can be imported to and exported from programs that store data in tables,
such as Microsoft Excel or OpenOffice Calc.
 Advantages of .csv files
• Simple and compact for data storage.
• A common format for data interchange.
• It can open in spreadsheets.
 By using read_csv() method we can read and convert into DataFrame.
 Syntax to transfer .csv file to DataFrame

DataFrame_Name=pandas.read_csv(“path\\filename.csv”)

Eg(1): Write a python code to convert “simple.csv” which is in “D” drive “IP” folder into a DataFrame
called “DF1”.

2.Transferring data from DataFrames to .csv files


 By using to_csv() function we can convert a DataFrame into .csvfile.
Synatx:

DataFrame_Name.to_csv(“path\\filename.csv”)

 The DataFrame will be converted and stored in the specified path with given name.

28
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
Eg:

SAMPLE QUESTIONS
1. Given a DataFrame DF1 shown below.

What will be the result of the following statements?


(a) DF1.Sub[„row1‟]=99.99 (b) DF1[„Sub4‟]=[100,200,300]
(c) DF1.head(2) (d) DF1.tail(2)
2. Write a python statement/code to do the following w.r.t. DataFrame „DF1‟
a) To insert a new column called ‟address‟
b) To access all values of a column called „address‟
c) To access an individual data, which is in „row3‟ and „column3‟
d) To drop all NaN values from DF1
e) To fill all NaN values in DF1 with 999
f) To display all the rows from „row1‟ to „row5‟
3. Why does the following code cause error?
import pandas as pd
DF1=pd.DataFrame([2,3,4])
DF2=pd.DataFrame([[2,3,4]])
print(DF1==DF2)

4. Answer the following question from (a) to ( h) based on given DF2.

import pandas as pd
d1={'name':['ram','peter'],'empno':[1,2],'sal':[100,200]}
DF2=pd.DataFrame(d1,index=['emp1','emp2'])

29
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
a) print(DF2)
b) print(DF2.loc['emp1','name'])
c) print(DF2.iloc[0,0])
d) print(DF2.name['emp1'])
e) What you have observed from QNos (b) to (d)?
f) print(DF2['sal']>150)
g) print(DF2.iloc[0:1,0:1])
h) print(DF2.loc['emp1':'emp2','name':'empno'])
5. Write a python code to create the following DataFrame called „DF3‟.

6. Based on the following DF1 and DF2, answer from (a) to ( f)

(a) print(DF1+DF2)
(b) print(DF1.add(DF2))
(c) print(DF1.sub(DF2))
(d) print(DF1.rsub(DF2))
(e) print(DF1.mul(DF2))
(f) print(DF1.div(DF2))

7. Writea Pythonprogram thatreplaces allnegativevalues ina DataFrame„DF1‟ with 0(zero)?


8. Write a Python program that finds three largest values in a DataFrame?
9. Carefully observe the following code:
>>> import pandas as pd
>>> xiic = {‘amit’:34, ‘kajal’:27, ‘ramesh’:37}
>>> xiid = {‘kajal’:34, ‘lalta’:33, ‘prakash’:38}
>>> result = {‘PT1’:xiic, ‘PT2’:xiid}
>>> df = pd.DataFrame(result)
>>> print(df)
Answer the following:
i) List the index of the dataframe df
ii) Find the output of the following code : print(df.loc[„kajal‟:‟ramesh‟])
10.Given a Pandas series called HEAD, the command which will display the first 3 rows is ______.
(A) print(HEAD.head(3)) (B) print(HEAD.Heads(3))
(C) print(HEAD.heads(3)) (D) print(head.HEAD(3))
11. We can create dataframe from:
(A) Series (B) Numpy arrays
(C) List of Dictionaries (D) All of the above

30
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
12. Write a Python code to create a DataFrame with appropriate column headings from the list given
below:
[[1001,'IND-AUS',‟2022-10-17‟], [1002,'IND-PAK',‟2022-10-23‟], [1003,'IND-SA' , „2022-10-
30], [1004,'IND-NZ',‟2022-11-18‟]]

13. Consider the given DataFrame „Items‟:


Name Price Quantity
0 CPU 7750 15
1 Watch 475 50
2 Key Board 225 25
3 Mouse 150 20
Write suitable Python statements for the following:
i) Add a column called Sale_Price which is 10% decreased value of Price
ii) Add a new item named “Printer” having price 8000 and Quantity as 10.
iii) Remove the column Quantity
14. Mr. Summit, a data analyst has designed the DataFrame df that contains data about Computer
infrastructure with „S01‟, „S02‟, „S03‟, „S04‟, „S05, „S06‟ as indexes shown below. Answer the following
questions:

i) Predict the output of the following python statement:


A) df.shape
B) df[2:4]
ii) Write Python statement to display the data of working column of indexes S03 to S05.
OR (Option for part ii only)
Write Python statement to compute and display the difference of data of computers column and
working column of the above given DataFrame.
15. Consider the following DataFrame, DF

Write commands to :
i. Add a new column ‘Stream’ to the Dataframe with values (Science, Commerce, Arts, Science.)
ii. Add a new row with values ( 5 , Mridula ,X, F , 9.8, Science)

31
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
16. Consider the following DataFrame “HOSPITAL”

17. DataFrame „STU_DF‟:


rollno name marks
0 115 Pavni 97.5
1 236 Rishi 98.0
2 307 Preet 98.5
3 422 Paul 98.0
Perform the following operations on the DataFrame stuDF:
i. Add a new row in dataframe STU_DF with values [444,‟karan‟,88.0]
ii. Print no of rows and columns in dataframe STU_DF
iii. Delete row for rollno 307.
18. Consider the following DataFrame, ClassDF with row index St1,St2,St3,St4

Based on the above dataframe answer the followinsg:


A. Predict the output
i. ClassDF.T
ii. ClasDF [ : : -2]
B. Write python statement to print Name,class and CGPA for Student St2 and St3
OR
write python Statement to print the name and class of students having CGPA more than 9.0

19. In Pandas which of the following dataframe attribute can be used to know the number of rows and
columns in a
dataframe
a. size b. index c. count d. shape
20. Carefully observe the following code:
import pandas as pd
L=[['S101','Anushree',65],['S102','Anubha',56],['S104','Vishnu',67],['S105','Kritika',45]]
df=pd.DataFrame (L, columns=['ID','Name','Marks'])
print(df)
i. What is the shape of the data frame df?
ii. Name the index and column names of dataframe df
32
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
21. Write a Python code to create a DataFrame „Df‟ using dictionary of lists for the following data.

22. Consider the following dataframe ndf as shown below :

What will be the output produced by following statements :-


a. print( ndf.loc [ : , ‟Col3‟ : ] )
b. print( ndf.iloc[2 : , : 3] )
c. print( ndf.iloc [ 1:3 , 2:3 ])

23. Mr. Kapoor, a data analyst has designed the dataframe DF that contains data about Attendance and
number of
classes of a week as shown below. Answer the following questions:

A. Predict the output of the following python statement:


a. print(DF[3: ])
b. print(DF.index)
B.Write Python statement to display the data of „No_of_classes‟ column of indexes „Tuesday‟ to
„Thursday‟
OR (for option B only)
Write python statement to calculate No_of_classes * Atten and display it as Total attendance in a
day.
24. What are the advantages of .csv files?
25. What function do we need to use to convert .cs v to DataFrame?
26. What function do we need to use to convert DataFrame to .cs v file?

ASSERTION AND REASONING based questions. Mark the correct choice as


(A) Both A and R are true and R is the correct explanation for A
(B) Both A and R are true and R is not the correct explanation for A
(C) A is True but R is False
(D) A is false but R is True

33
INFORMATICS PRACTICES (NEW) - XII BY G SIVA PRASAD
(PGT)
27. sAssertion (A): DataFrame has both a row and column index.
Reasoning (R): .loc() is a label based data selecting method to select a specific row(s) or column(s) which
we want to select.

28. Assertion (A): - When DataFrame is created by using Dictionary, keys of dictionary are set as
columns of DataFrame.
Reasoning (R):- Boolean Indexing helps us to select the data from the DataFrames using a boolean vector

29. Assertion (A):- While creating a DataFrame with a nested or 2D dictionary, Python interprets the
outer dict keys as the columns and the inner keys as the row indices.
Reasoning (R):- A column can be deleted using remove command
.

34

You might also like