0% found this document useful (0 votes)
16 views8 pages

Chapter 2 Python Pandas

The document provides an overview of Python Pandas, focusing on quantiles, functions for handling missing values, and various operations on DataFrames. It includes multiple-choice questions, true/false statements, and fill-in-the-blank exercises related to DataFrame manipulation and statistical functions. Additionally, it discusses assertions and reasons regarding data handling in Pandas, along with solved problems for practical understanding.

Uploaded by

familyirctc123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Chapter 2 Python Pandas

The document provides an overview of Python Pandas, focusing on quantiles, functions for handling missing values, and various operations on DataFrames. It includes multiple-choice questions, true/false statements, and fill-in-the-blank exercises related to DataFrame manipulation and statistical functions. Additionally, it discusses assertions and reasons regarding data handling in Pandas, along with solved problems for practical understanding.

Uploaded by

familyirctc123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

145

2;
PYTHON PANDAS
C h a p l e r

Quantiles are points in a distribution of data that relate to ashare of values in that distribution. The quantile of a
the fraction of fobservations less than or equalto the value.
value is

The quantile
of the meedian is 0.5, by definition.
apply functions to whole dataframe or subsets of dataframes.
You can
Function
isnull() checking ifa DataFrame has missing values.
4
Function ffillna() fills the missing values.

bjective Type Questions


OTQ5
MULTIPLE CHOICE QUESTIONS

of dataframe. function may be used.


1 To iterate over horizontal subsets
(a) iterate( ) v iterrows( ) () itercols( ) (d) iteritems()
subsets of a dataframe. function may be used.
) To iterate over vertical
() iterate() (b) iterrows( ) (c) itercols( ) iteritenms( )
values. function may be used.
3. To add two dataframes
(6) rplus v ) add )radd
() plus
dataframes, function may be used.
4. To subtract the values of two
(b) difference (c) minus df rsub
v sub
dataframes, function may be used.
5. To divide the values of two
(a) divide div t rdiv (d) division
and DF1 are DataFrames ?
6. Which of the following would give the same output as DF/DF1 where DF
vo DE.div(DFI) (b) DF1.div(DF)
(d) Div(DE, DF1) [CBSE 22 (Term I)]
() Divide(DE, DF1)
attribute.
7. Toskip NaN values in a calculation, you can specify
(6) NA (d) all of these
oskipna
(a) NaN
with dataframes ?
8, Which of the following is not a valid function that can be used
(a) count( ) (b) sum( ) length( ) (d) mad( )
equal proportions is
9. The technique that divides total distribution of data into a given number of
called a
(4) quartile (b) tercile (c) median Ltf quantile
10 divides the total distribution in four equal parts.
(b) tercile (c) median (a) quantile
Va) quartile
function is used.
11. To divide total distribution of given data in two equal parts,
(a) median() () quartile()
vÓquantile( ) (d) all of these
function is uaed.
12. To divide total distribution of given data in four equal parts,
(a) median( ) (b) quartile( )
Vo quantile( ) (d) all of these
146
INFORMATICS
distribution of given data in
eight equal parts, function is used
13, To divide total
(b) quartile( )
(a) median( ) (d) all of these
MØquantile( )
the folowing is not a descriptive statistics function ?
14: Which of (c) sum( )
(a) count() V0) add( ) (d) max(O
statistical details for a dataframe 2
15. Which function calculates descriptive Vb)describe( )
(0) info( )
() show(O (d) list( )
16. To calculative cumulative sum of a column of a dataframe, you may use
(b) sum(cumulative = True)
function.
(2) sum()
V) cumsum() (d) none of these
17. The function to get the index of maximum value in a column of dataframe is
(a) max( ) (b) index(0 w) idxmax( ) (d) maxidx( )
18. To get top 5 rows of a dataframe, you may use function.
Va) head( ) Mb) head(5) (c) top( ) (a) top(5)
19, To get bottom 3 rows of a dataframe, you may use function.
(a) tail( ) b) tail(3)
() bottom() (d) bottom(3)
20. Function can be used to drop missing values.
(a) fillna(O (b) isnull() Mo) dropna( ) (d) delna( )
21. Which of the following methods of combining two dataframes
is a patching method ?
(a) concat( ) (b) merge() ) join( ) (d) none of these
22. Boolean indexing in Pandas DataFrame can be
used for
(a) Creating a new DataFrame
(b) Sorting data based on index labels
(o) Joining data using labels
Md) Filtering data based on condition CRSE SPS

FILL IN THE BLANKS


,Itero,
and
1tete sC)
functions help you to iterate over a
2. To add two dataframes, you may use dataframe.
3. To concatenate two string
functíons & do or aof
columns of a dataframe,
4. The ele unction returns the operator is used.
5. The menfunction maximum repeating value.
returns the average of givern
6. The meolunction data.
returns the halfway point in a given data.
7. Using g491c ( you can
49utle calculate terciles.
O. Using 2 fúnction you can calculate
9. Using 24a nthfunction you quartiles.
can calculate octiles.
10. Using 3m9method, you can check if
11 The
moyuntion combines two dataframesthere are any missing values in dataframe. valuea

merged together in the final result. such that twO rows Some common

with
PANDAS- I|
Chanter 2:PYTHON 147

TRUE/FALSE QUESTIONS
,The iteritems( )iterates over the rows of adataframe.
The iteritems( )iterates over the columns of a dataframe. 7
The result produced by the functions sub() and rsub() is the same. F
1 The result produced by the functions add() and radd) is the same.7
5 The result produced by the functions div() and rdiv( ) is the same.
The info( ) and describe( ) list the same information about a dataframe
- Function add() and operator + give the same result.
e Function rsub()and operator - give the same result.
o The minus - operator's resultis same as sub( ) and rsub( ). F
10. Python integer datatype can store NaN values. F
11 Functions sum() and cumsum( ) produce the same result.
12. The fillna( )can also fill individual missing values for dificult columns. 7
13. To drop missing values from a dataframe, the function used is delna( ). P

AsSERTIONS AND REASONS

Directions
In the following questions, a statement of Assertion (A) is followed by a statement of Reason (R).
Mark the correct choice as :
(a) Both A and R are true and R is the correct explanation of A.
(b) Both A and R are true but R is not the correct explanation of A.
(c) A is true but R is false (or partly true).
(d) A is false (or partly true) but R is true.
(e) Both A and R are false or not fully true.

1. Assertion. A quantile refers to equally distributed portion of a data set. divides a distribution in 4
Reason. A median divides a distribution in 2 quantiles while a quartile
quantiles.
statistics of a dataset.
2. Assertion. Data aggregation produces a summary
aggregation functions.
Reason. Data aggregation summarises data using statistical
cannot contribute to any computation.
3. Assertion. In adataset, there can be missing values that
missing values.
Reason. In adataset, NULL, NaN or None are considered the
will be NaN, if one of the elements or both the
4 Assertion (A), The output of addition of two series
elements have no value(s).
Keason (R), While performing mathematical operations
on a series, by default all missing values
|CBSE D 23]
are filled in with 0.
and a DataFrame is atwo-dimensional array
P Assertion (A). A Series is a one-dimensional array
list, string, etc.)
Containing sequence of values of any data type. (int, float,
default numeric indexes starting from zero.
Keason (R). Both Series and DataFrames have by [CBSE D 24]

NOTE : Answers for 0T0s are given at the end of the book.
INFORMATICS PRACTICES
148

Solved Problems
dataframes.
functionsyou can use
toiterate over
1. Name the iteritems()
and
Solution. iterrows( ) and iteritems) ?
diference between iterrows()
Solution.
2. What is the<DE.iteritems( )iterates over vertical subsets in the form of (col-index, Series) pairs, and
basic
<DE.iterrows( )iterates over horizontal subsets in the form of (row-index, Series) pairs.
3. In Binary addition, if a column in a data frame contains a NaN and the corresponding column in other
what would be returned as a result? Whu?
DataFrane is a nuneric value,
then
number is a NaN always. This
addition of a NaN value and a is
Solution. The result of
NaN means not-a-number and there is functionality defined only for adding two numbers and not because,
with a NaN.
shown below:
4. Given two dataframe df3 and df4 as

>>> df3 >>> df4 >>> df3 + df4

A B A
A B C
300 1000 2000 1100.e 2200. NaN
100 200
1 40e 500 600 1 4000 5000 1 4400.e 5500. NaN
2 7000 8000 2 NaN NaN NaN

Both these dataframes store integer values but when they are added as df3 + df4, the values in the resultant
object automatically change to floating point (as shown on above right) contrary to the fact the tuo integers
when added will result into integer only. Can you specify the
reason ?
Solution. The reason behind the conversion to floating point tVpe is that
the two dataframes hare
different indexes and columns. For the non-matching rOW
NaN values to corresponding value from indexes and columns, Python Wl ae
another dataframe.
Python stores NaN values in a
present in any column, the non-integer suitable data type. Thus, the moment NaNis
added or

represented as floating point datatype of the entire column is


value because of the changed. Thus, all the values
are
5. Given two
Dataframes One and Two as shown presence of NaN values in their column.
here :
>>> One
>>> TWO
name
value
name value i
1.0
1
2.0 1.0
2 1
NaN NaN
2
3.0
What will be
the result of the S 4.0
(a) One.radd(Two) follouwing ?
(b) One+ Two
(c) One.rsub(Two)
PYTHON PANDAS-1|
Chopter
2: 149

Solution.

For both (a)


and (b) the output will be like :
>>> One + TwO
name value
Pp 2.0
qq NaN
2 rr NaN
3 NaN NaN

For (), Python will raise an error as string values cannot be subtracted.

6 Write equivalent function jor the following operations on two DataFrames Aand B:
() A+B (i) B+A (ii) A-B (io) B-A (v) B*A (vi) B/A (v) A/B
Solution.

() A.add(B) (ii) B.add(A) (ii) A.sub(B) (iv) B.sub(A) or A.rsub(B)


(o) B.mul(A) (oi) A.rdiv(B) or B.div(A) (vi) B.rdiv(A) or A.div(B)
7 Givenadataframe namely wdf as shown below :
minTemp maxTemp Rainfall Evaporation
1 8.0 24.3 0.0
1 2 14.0 26.9 3.6
2 13.7 23.4 3.6
4 13.3 15.5 39.8
4 5 7.6 16.1 2.8
5 6 6.2 16.9 0.0
7 6.1 18.2 0.2
7 8 8.3 17.0 0.0
8 8.8 19.5 0.0
9 10 8.4 22.8 16.2
10 11 9.1 25.2 0.0
11 12 8.5 27.3 0.2
12 13 10.1 27.9 0.0
13 14 12.1 30.9 0.0
14 15 10.1 31.2 0.0
15 16 12.4 32.1 0.0
16 17 13.8 31.2 0.0
17 18 11.7 30.0 1.2
19 12.4 32.3 0.6
18
20 15.6 33.4 0.0
19
20 21 15.3 33.4 0.0 !

() Write command to compute sum of every column of the dataframe.


() Write command tocompute mean of column Rainfall.
(m) Write command to compute sum of every row of the dataframe.
for last 10 rows only.
(10) Write command to compute average of all the columns
for first 10 rows.
(0) Write command to compute average maxTemp, Rainfall
Solution.
() wdf.sum()
(i1) wdf['Rainfall'].mean ()
(ii) wdf.sum(axis = 1)
(iv) wdf.loc[11:, ].mean()
(v) wdf.loc[ :11, 'maxtemp': 'Rainfall'].mean()
150 INFORMATICS PRACTICES-
pPoua
quartile ? How is it related to quantile How do yougenerate it in
8 Whatis a that divide a distribution into
Solution. Quartiles Q1, Q2
and Q3 are three points 4 equal parts
each of the entire distribution. The 4-quantiles are
containing 25% percentile
share in an equally divided distribution e.g., median quantile
called
quantile refers to an equal share is 50% quantile. Aquartiles.
distribution into 2 equal parts and each
to
equal
when a distribution is divided into
a divides
Quartile on the other hand refers four quantiles each
containing 25% percentile.
quantile().
In Pandas, we generate these with function
9. Consider the following DataFrame 'mdf.
Rollno Name English Hindi Maths
1 Aditya 23 20 28
2 Balwant 18 1
25
2 Chirag 27 23 30
3 4 Deepak 11 3 7
4 5LO Eva 17 21 24

(a) Write Python statements for the DataFrame 'maf :


() To display the records of the students having roll nunbers 2 and3.
(ii) To increase the marks of subject Math by 4, for all students.
(b) Write Python statement todisplay the Rollno and Name of all
in maths.
students who secured less than 10 marks
(c) Write Python statement to display the total marks i.e., sum of
Maths for all students. marks secured in English, Hindi and
Solution. [CBSE D23)

(a) () df.loc[(1, 2)]


(ii) df[' Math']+= 4
(b) print (df [df . Maths <10][['Rollno',
(c) print (mdf.English + mdf.Hindi +
'Name']])
mdf .Maths)
10. What is descriptive statistics ? Name
the
Solution. A descriptive statistic is a functions commonly used for calculating this.
features of a collection of summary statistic that quantitatively describes or Summarizes

information.
Commonly used functions for
descriptive
count( ), sum( ), mean( ), statistics are :
max( ), min( ), std( ),
11. What is missing data ? Why
is it considered a
quartiles etc.
Solution. Missing Data means problem ?
when
unit. Missing Data can also refer to noas information
NA (Not Available)
is providedvalues in Pandas.
whole

Pandasorrputsa NaNin
for
more items
place of missing data in dataframes. for one or

Missing Data is a very big problem in real life scenario,. This is because, the presence of NaNIhampers
calculations because Nan cannot be usedin calculations and in fact, it makes the whole calculation
result as NaN.
PANDAS - |
151
PYTHON
Chapter
22:

command
Wrile
t to print cumulative sum of columns Rainfall and Evaporation in the dataframe wdf used above.
Solution.
wdf[['Rainfall', 'Evaporation ']].apply(np.cumsum)
>>> wdf[['Rainfall', "Evaporation' ]). apply(np.cumsum)
Rainfall Evaporation
24.3 0.0
1 S1.2 3.6
2 74.6 7.2
90.1 47.0
4 106.2 49.8
123.1 49.8
6 141.3 50.0
7 158.3 50.0
8 177.8 50.0
200.6 66.2
10 225.8 66.2
11 253.1 66.4
12 281.0 66.4
13 311.9 66.4
14 343.1 66.4
15 375.2 66.4
16 406.4 66.4
17 436.4 67.6
18 468.7 68.2
19 502.1 68.2
20 535.5 68.2

on a dataframe ?
13. Is there any one function that performs descriptive statistics
calculates most descriptive statistics
Solution. Yes, Pandas provides describe( ) function that
(quartiles), e.:,
information for a DataFrame along with 25%, 50% and 75% percentile values
>>> df.describe ()
Age Projects Budget
count 6.000000 6.000000 6.000000
mean 31.500000 16.833333 23.500000

std 4.636809 3.188521 14.237275

min 27.000000 13.000000 10.000000


28.500000 14.500000 14.000000
25%
31.000000 16.500000 19.000000
50%
32.000000 19.250000 29.250000
75%
40.000000 21.000000 48.000000
max

14. Whatfunctions does Pandas provide to handle missing data ?


Solution. Most common functions provided by Pandas to handle missing
data are :
isnull(), dropna(), fillna()
result as given below:
15. Ms. Ritika conducted an online assessment and stored the details in a DataFrame
[CBSE D 24]

Name Score Attempts Qualify


Atulya 12.5 1 yes
b 9.0 3 no
Disha
C Kavita 16.5 2 yes
d 15.0 1 no
John
152 INFORMATICS pRACICE.
Answer the following questions :
Python statement :
() Predict the output of the following
1)
print (result. loc[:, "Attempts']>
last three records.
() Write the Python statement to display the
labels.
(ii) Write the Python statement to display records of 'a' and d' row
Or (Option for part (ii) only)
(n) Write suitable Python statement to retrieve the data stored in the file, 'registration.csu' into a
regis'. Dataray
Solution.
() False
True
True
False
(ii) result.tail (3)
(i) result.loc[["a", "d"]]
Or
(ii) regis = pd.read_csv("registration.csv")

Practical Questions
16. Write a program to iterate over a dataframe containing
marks (as per guidelines below) and adds nanmes and marks, which then calculates grades s pa
them to the grade colunmn.
Marks >= 90 grade A+ ;
Marks 70-90 grade A; Marks 50-60 grade C;
Marks 60-70 grade B; Marks 40-50 grade D;
Solution. Marks <40 grade F
import pandas as pd
import numpy as np
names = pd.Series ([' Rohan',
marks = pd.Series ([76.0, 56.0,'Misha', 'Mike' , 'Simran' ])
Stud ={'Name' : names, 91.0, 67.0])
df1 = pd. DataFrame 'Marks': marks }
(Stud, columns = ['Name',
dfi['Grade']= np. NaN 'Marks'])
print ("Initial values in This will add NaN values to complete colunn Grude
print(df1) dataframe")
for (col, colSeries) in
df1.iteritems () :
length =len(colSeries)
if col == 'Marks': # number of entries in
colSeries

lstMrks =[]
for row in range # initialize empty list
(length) :
mrks = colSeries
if mrks >= 90:
[row]

lstMrks.append ('A+') #grade appended to


list
lstMrks

You might also like