Unit-1: Data Handling - (DH) : PAN DA S
Unit-1: Data Handling - (DH) : PAN DA S
(PGT)
UNIT-1: DATA HANDLING –(DH)
Pandas (PANel DAta Systems)
Pandas is an open-source Python Library providing high-performance data manipulation and
analysis tool using its powerful data structures .
Note: Data structures means, storing and organizing the data in efficient way .
Advantages of Pandas:
High performance merging and joining of data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and subsetting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
Data alignment and integrated handling of missing data.
1) Series
Series is a one-dimensional labeled array capable of holding homogenous data.
In above syntax,
‘data’ takes various forms like ‘list’ or ‘ndarray’.
‘index’ represents index of individual data item.
‘dtype’ represents data type of values. Default ‘dtype’ value is ‘float 64’.
SERIES 116
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
Output:
Series([], dtype: float64)
Note: In above example (1), we have created empty Series object with default data type ‘float64’ .
Eg (2): Creating Series object from list with default index values.
#File Name: Series_defalut_index_from_list.py
import pandas as pd
obj1=pd.Series([11,22,33,44])
print(obj1)
Output:
0 11
1 22
2 33
3 44
dtype: int64
Note: In above example (2), we have created Series object(obj1) with default indexes (i.e. 0,1,2…)
and dtype is ‘int64’ instead of ‘float64’ (because, in the list we provided intergers) .
Eg (3): Creating Series object from list with our own index values.
#File Name: Series_our_own_index_from_list.py
import pandas as pd
obj1=pd.Series([11,22,33,44],[' a','b','c','d'])
print(obj1) Note: This can also written as follows.
Output: obj1=pd.Series(data=[11,22,33,44],index=['a','b','c','d'])
a 11
b 22
c 33
d 44
dtype: int64
Note: In above example (3), we have created Series object (obj1) with our own indexes (i.e.
‘a’,’b’,’c’,’d’)
SERIES 117
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
import pandas as pd
d1={'name':'john','rno':1,'phno':456} #here, ‘d1’ is dictionary
obj1=pd.Series(d1) # here, ‘obj1’ created from ‘d1’
print(obj1)
Output:
name john
rno 1
phno 456
dtype: object
Note: In above example (4), we have created Series object (obj1) from dictionary ‘d1’. In this case,
Dictionary keys are used to construct index.
SERIES 118
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
Note-1: In above example we have printed Series objects ‘obj1’ (with default index values) and ‘obj2’
(with our own index values).
Note-2: ‘arange()’ function is similar to ‘range()’, but ‘arange()’ returns ndarray and ‘range()’ return ‘list’.
SERIES 119
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
Note: In this default datatype, i.e. float64 Note: In this datatype has been changed to int64 using
‘dtype=numpy.int64’
1.2. Series Attributes:
Some common attributes of Series objects are:
Attribute Meaning
Series.index The index of the Series
Series.values Return Series as ndarray
Series.dtype Return data type of data
Series.shape Return a tuple of the shape
Series.nbytes Return the number of bytes occupied by data
Series.ndim Return the number of dimensions
Series.size Return the number of elements
Series.hasnans Return True if Series has NaN values, otherwise False
Series.empty Return True if Series object is empty, otherwise False
SERIES 120
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
SERIES 121
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
SERIES 122
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
tail(n) function is used to get last ‘n’ rows from pandas object.
If we won’t supply parameter to tail() function, then it will return last 5 rows from Series object.
Eg (1):
SERIES 124
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
SERIES 125
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
1.7. Arithmetic Operations:
The arithmetic operations will perform only on matching indexes.
Arithmetic operators will return NaN (Not a Number), if indexes don’t match.
Let us consider the following two Series objects, obj1 and obj2.
Eg (1):
Note: In above example(1), the indexes 1 and 2 matched in obj1 and obj2. That’s why addition took
place on those two values. Remaining indexes are not matched, so ‘NaN’ came in the result.
Eg (2):
SERIES 126
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
SAMPLE QUESTIONS
1. What is the significance of pandas?
2. Name some data structures in pandas library?
3. Given the following Series objects:
S1 S2
0 3 1 11
1 5 2 22
2 6 4 33
3 5 5 44
What will be the result of
(i) S1+S2
(ii) S1-S2
(iii) S1.tail(2)
(iv) S2.head(3)
SERIES 127