Chapter 2 Python Pandas
Chapter 2 Python Pandas
2;
PYTHON PANDAS
C h a p l e r
Quantiles are points in a distribution of data that relate to ashare of values in that distribution. The quantile of a
the fraction of fobservations less than or equalto the value.
value is
The quantile
of the meedian is 0.5, by definition.
apply functions to whole dataframe or subsets of dataframes.
You can
Function
isnull() checking ifa DataFrame has missing values.
4
Function ffillna() fills the missing values.
merged together in the final result. such that twO rows Some common
with
PANDAS- I|
Chanter 2:PYTHON 147
TRUE/FALSE QUESTIONS
,The iteritems( )iterates over the rows of adataframe.
The iteritems( )iterates over the columns of a dataframe. 7
The result produced by the functions sub() and rsub() is the same. F
1 The result produced by the functions add() and radd) is the same.7
5 The result produced by the functions div() and rdiv( ) is the same.
The info( ) and describe( ) list the same information about a dataframe
- Function add() and operator + give the same result.
e Function rsub()and operator - give the same result.
o The minus - operator's resultis same as sub( ) and rsub( ). F
10. Python integer datatype can store NaN values. F
11 Functions sum() and cumsum( ) produce the same result.
12. The fillna( )can also fill individual missing values for dificult columns. 7
13. To drop missing values from a dataframe, the function used is delna( ). P
Directions
In the following questions, a statement of Assertion (A) is followed by a statement of Reason (R).
Mark the correct choice as :
(a) Both A and R are true and R is the correct explanation of A.
(b) Both A and R are true but R is not the correct explanation of A.
(c) A is true but R is false (or partly true).
(d) A is false (or partly true) but R is true.
(e) Both A and R are false or not fully true.
1. Assertion. A quantile refers to equally distributed portion of a data set. divides a distribution in 4
Reason. A median divides a distribution in 2 quantiles while a quartile
quantiles.
statistics of a dataset.
2. Assertion. Data aggregation produces a summary
aggregation functions.
Reason. Data aggregation summarises data using statistical
cannot contribute to any computation.
3. Assertion. In adataset, there can be missing values that
missing values.
Reason. In adataset, NULL, NaN or None are considered the
will be NaN, if one of the elements or both the
4 Assertion (A), The output of addition of two series
elements have no value(s).
Keason (R), While performing mathematical operations
on a series, by default all missing values
|CBSE D 23]
are filled in with 0.
and a DataFrame is atwo-dimensional array
P Assertion (A). A Series is a one-dimensional array
list, string, etc.)
Containing sequence of values of any data type. (int, float,
default numeric indexes starting from zero.
Keason (R). Both Series and DataFrames have by [CBSE D 24]
NOTE : Answers for 0T0s are given at the end of the book.
INFORMATICS PRACTICES
148
Solved Problems
dataframes.
functionsyou can use
toiterate over
1. Name the iteritems()
and
Solution. iterrows( ) and iteritems) ?
diference between iterrows()
Solution.
2. What is the<DE.iteritems( )iterates over vertical subsets in the form of (col-index, Series) pairs, and
basic
<DE.iterrows( )iterates over horizontal subsets in the form of (row-index, Series) pairs.
3. In Binary addition, if a column in a data frame contains a NaN and the corresponding column in other
what would be returned as a result? Whu?
DataFrane is a nuneric value,
then
number is a NaN always. This
addition of a NaN value and a is
Solution. The result of
NaN means not-a-number and there is functionality defined only for adding two numbers and not because,
with a NaN.
shown below:
4. Given two dataframe df3 and df4 as
A B A
A B C
300 1000 2000 1100.e 2200. NaN
100 200
1 40e 500 600 1 4000 5000 1 4400.e 5500. NaN
2 7000 8000 2 NaN NaN NaN
Both these dataframes store integer values but when they are added as df3 + df4, the values in the resultant
object automatically change to floating point (as shown on above right) contrary to the fact the tuo integers
when added will result into integer only. Can you specify the
reason ?
Solution. The reason behind the conversion to floating point tVpe is that
the two dataframes hare
different indexes and columns. For the non-matching rOW
NaN values to corresponding value from indexes and columns, Python Wl ae
another dataframe.
Python stores NaN values in a
present in any column, the non-integer suitable data type. Thus, the moment NaNis
added or
Solution.
For (), Python will raise an error as string values cannot be subtracted.
6 Write equivalent function jor the following operations on two DataFrames Aand B:
() A+B (i) B+A (ii) A-B (io) B-A (v) B*A (vi) B/A (v) A/B
Solution.
information.
Commonly used functions for
descriptive
count( ), sum( ), mean( ), statistics are :
max( ), min( ), std( ),
11. What is missing data ? Why
is it considered a
quartiles etc.
Solution. Missing Data means problem ?
when
unit. Missing Data can also refer to noas information
NA (Not Available)
is providedvalues in Pandas.
whole
Pandasorrputsa NaNin
for
more items
place of missing data in dataframes. for one or
Missing Data is a very big problem in real life scenario,. This is because, the presence of NaNIhampers
calculations because Nan cannot be usedin calculations and in fact, it makes the whole calculation
result as NaN.
PANDAS - |
151
PYTHON
Chapter
22:
command
Wrile
t to print cumulative sum of columns Rainfall and Evaporation in the dataframe wdf used above.
Solution.
wdf[['Rainfall', 'Evaporation ']].apply(np.cumsum)
>>> wdf[['Rainfall', "Evaporation' ]). apply(np.cumsum)
Rainfall Evaporation
24.3 0.0
1 S1.2 3.6
2 74.6 7.2
90.1 47.0
4 106.2 49.8
123.1 49.8
6 141.3 50.0
7 158.3 50.0
8 177.8 50.0
200.6 66.2
10 225.8 66.2
11 253.1 66.4
12 281.0 66.4
13 311.9 66.4
14 343.1 66.4
15 375.2 66.4
16 406.4 66.4
17 436.4 67.6
18 468.7 68.2
19 502.1 68.2
20 535.5 68.2
on a dataframe ?
13. Is there any one function that performs descriptive statistics
calculates most descriptive statistics
Solution. Yes, Pandas provides describe( ) function that
(quartiles), e.:,
information for a DataFrame along with 25%, 50% and 75% percentile values
>>> df.describe ()
Age Projects Budget
count 6.000000 6.000000 6.000000
mean 31.500000 16.833333 23.500000
Practical Questions
16. Write a program to iterate over a dataframe containing
marks (as per guidelines below) and adds nanmes and marks, which then calculates grades s pa
them to the grade colunmn.
Marks >= 90 grade A+ ;
Marks 70-90 grade A; Marks 50-60 grade C;
Marks 60-70 grade B; Marks 40-50 grade D;
Solution. Marks <40 grade F
import pandas as pd
import numpy as np
names = pd.Series ([' Rohan',
marks = pd.Series ([76.0, 56.0,'Misha', 'Mike' , 'Simran' ])
Stud ={'Name' : names, 91.0, 67.0])
df1 = pd. DataFrame 'Marks': marks }
(Stud, columns = ['Name',
dfi['Grade']= np. NaN 'Marks'])
print ("Initial values in This will add NaN values to complete colunn Grude
print(df1) dataframe")
for (col, colSeries) in
df1.iteritems () :
length =len(colSeries)
if col == 'Marks': # number of entries in
colSeries
lstMrks =[]
for row in range # initialize empty list
(length) :
mrks = colSeries
if mrks >= 90:
[row]