0% found this document useful (0 votes)
102 views12 pages

Unit-1: Data Handling - (DH) : PAN DA S

Pandas is an open-source Python library that provides powerful data structures and data analysis tools. It allows high performance manipulation of numerical tables and time series. The two main data structures in Pandas are Series and DataFrame. A Series is a one-dimensional labeled array capable of holding any data type. DataFrame is a two-dimensional data structure with labeled axes. Pandas can be used for data cleaning and transformation, merging and joining data, and data visualization.

Uploaded by

Abhinav Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views12 pages

Unit-1: Data Handling - (DH) : PAN DA S

Pandas is an open-source Python library that provides powerful data structures and data analysis tools. It allows high performance manipulation of numerical tables and time series. The two main data structures in Pandas are Series and DataFrame. A Series is a one-dimensional labeled array capable of holding any data type. DataFrame is a two-dimensional data structure with labeled axes. Pandas can be used for data cleaning and transformation, merging and joining data, and data visualization.

Uploaded by

Abhinav Khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD

(PGT)
UNIT-1: DATA HANDLING –(DH)
Pandas (PANel DAta Systems)
Pandas is an open-source Python Library providing high-performance data manipulation and
analysis tool using its powerful data structures .

Note: Data structures means, storing and organizing the data in efficient way .

 Pandas deal with the following data structures.


1) Series
2) DataFrame
3) Panel

Note: “ Panel” Data structure is not in our syllabus.

Advantages of Pandas:
 High performance merging and joining of data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and subsetting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 Data alignment and integrated handling of missing data.

1) Series
 Series is a one-dimensional labeled array capable of holding homogenous data.

1.1 Syntax to create series

Obj_name=pandas.Series(data, index, dtype)

 In above syntax,
 ‘data’ takes various forms like ‘list’ or ‘ndarray’.
 ‘index’ represents index of individual data item.
 ‘dtype’ represents data type of values. Default ‘dtype’ value is ‘float 64’.

Eg (1): Creating empty Series object.


#File Name: Series_empty.py
import pandas as pd
obj1=pd.Series()
print(obj1)

SERIES 116
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
Output:
Series([], dtype: float64)
Note: In above example (1), we have created empty Series object with default data type ‘float64’ .

Eg (2): Creating Series object from list with default index values.
#File Name: Series_defalut_index_from_list.py

import pandas as pd
obj1=pd.Series([11,22,33,44])
print(obj1)

Output:
0 11
1 22
2 33
3 44
dtype: int64
Note: In above example (2), we have created Series object(obj1) with default indexes (i.e. 0,1,2…)
and dtype is ‘int64’ instead of ‘float64’ (because, in the list we provided intergers) .

Eg (3): Creating Series object from list with our own index values.
#File Name: Series_our_own_index_from_list.py

import pandas as pd
obj1=pd.Series([11,22,33,44],[' a','b','c','d'])
print(obj1) Note: This can also written as follows.

Output: obj1=pd.Series(data=[11,22,33,44],index=['a','b','c','d'])
a 11
b 22
c 33
d 44
dtype: int64
Note: In above example (3), we have created Series object (obj1) with our own indexes (i.e.
‘a’,’b’,’c’,’d’)

SERIES 117
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

Eg (4): Creating Series object from dictionary.


#File Name: Series_from_dictionary.py

import pandas as pd
d1={'name':'john','rno':1,'phno':456} #here, ‘d1’ is dictionary
obj1=pd.Series(d1) # here, ‘obj1’ created from ‘d1’
print(obj1)

Output:
name john
rno 1
phno 456
dtype: object

Note: In above example (4), we have created Series object (obj1) from dictionary ‘d1’. In this case,
Dictionary keys are used to construct index.

Creating Series object from ndarray (numpy arrays).

numpy(numerical python) arrays


 ‘numpy’ arrays are similar to python list, but there are few differences in between lists and
numpy arrays.
 Differences
 Once numpy array is created, you can’t increase the size. But python list size can vary.
 In python list, you can store any type of value, but in numpy arrays only similar type of
values.
 You can’t perform vector operations on lists, but on numpy arrays you can.
 Numpy arrays occupies less space than python lists.
 Top create numpy arrays, we have to import ‘numpy’ module as follows.
import numpy
(or)
import numpy as np

SERIES 118
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

Eg (5): Creating Series object from ndarray( using arange())

#File Name: Series_from_array_arange.py #File Name: Series_from_array_arange2.py


import pandas as pd import pandas as pd
import numpy as np import numpy as np
obj1=pd.Series(np.arange(10,20)) obj2=pd.Series(np.arange(10,20),index=['
print(obj1) a','b','c','d','e','f','g','h','i','j'])
print(obj2)
Output:
0 10 Output:
1 11 a 10
2 12 b 11
3 13 c 12
4 14 d 13
5 15 e 14
6 16 f 15
7 17 g 16
8 18 h 17
9 19 i 18
dtype: int32 j 19
dtype: int32

Note-1: In above example we have printed Series objects ‘obj1’ (with default index values) and ‘obj2’
(with our own index values).
Note-2: ‘arange()’ function is similar to ‘range()’, but ‘arange()’ returns ndarray and ‘range()’ return ‘list’.

SERIES 119
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

Eg (6): Creating Series object from ndarray (using linspace())


Note: linspace() function will return the given number of values in the range. In the following exam ples,
We printed five numbers in between 100 and 20 0(including both).

#File Name: Series_from_ndarray_linspace.py #File Name: Series_from_ndarray_linspace2.py


import pandas as pd import pandas as pd
import numpy as np import numpy as np
obj1=pd.Series(np.linspace(100,200,5) obj1=pd.Series(np.linspace(100,200,5),index
,index=['a','b','c','d','e',]) =['a','b','c','d','e',],dtype=np.int32)
print(obj1) print(obj1)
Output:
a 100.0 Output:
b 125.0 a 100
c 150.0 b 125
d 175.0 c 150
e 200.0 d 175
dtype: float64 e 200
dtype: int32

Note: In this default datatype, i.e. float64 Note: In this datatype has been changed to int64 using
‘dtype=numpy.int64’
1.2. Series Attributes:
Some common attributes of Series objects are:
Attribute Meaning
Series.index The index of the Series
Series.values Return Series as ndarray
Series.dtype Return data type of data
Series.shape Return a tuple of the shape
Series.nbytes Return the number of bytes occupied by data
Series.ndim Return the number of dimensions
Series.size Return the number of elements
Series.hasnans Return True if Series has NaN values, otherwise False
Series.empty Return True if Series object is empty, otherwise False

SERIES 120
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

Eg (1): A simple python program that illustrates all Series attributes.

1.3. Accessing Series objects and its elements:


 Series objects individual elements accessed by using their index values.
Eg (1):

SERIES 121
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

1.4. Slicing from Series objects:


 Slicing takes place in Series position wise and not the index wise.

SERIES 122
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

1.5. Modifying elements of Series objects:


 The data values of a Series object can be easily modified through item assignment.
Eg (1):

1.6. head() and tail() functions:


SERIES 123
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
 head(n) function is used to get first ‘n’ rows from pandas object.
 If we won’t supply parameter to head() function, then it will return first 5 rows from Series object.
Eg (1):

 tail(n) function is used to get last ‘n’ rows from pandas object.
 If we won’t supply parameter to tail() function, then it will return last 5 rows from Series object.

Eg (1):

SERIES 124
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

SERIES 125
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)
1.7. Arithmetic Operations:
 The arithmetic operations will perform only on matching indexes.
 Arithmetic operators will return NaN (Not a Number), if indexes don’t match.
 Let us consider the following two Series objects, obj1 and obj2.

Eg (1):

Note: In above example(1), the indexes 1 and 2 matched in obj1 and obj2. That’s why addition took
place on those two values. Remaining indexes are not matched, so ‘NaN’ came in the result.

Eg (2):

SERIES 126
INFORMATICS PRACTICES (REVISED) - XII BY G SIVA PRASAD
(PGT)

SAMPLE QUESTIONS
1. What is the significance of pandas?
2. Name some data structures in pandas library?
3. Given the following Series objects:
S1 S2
0 3 1 11
1 5 2 22
2 6 4 33
3 5 5 44
What will be the result of
(i) S1+S2
(ii) S1-S2
(iii) S1.tail(2)
(iv) S2.head(3)

4. Why does the following code cause error?


import pandas
obj1=pandas.Series([11,22,33],index=’abc’)
5. What will be the output produced by the following code?
import pandas as pd
obj1=pd.Series([11,22,33,44],index=[1,2,3,4])
(a) print(obj1[:2]) (b) print(obj1[2:]) (c) print(obj1.index) (d) print(obj1.shape)

SERIES 127

You might also like