0% found this document useful (0 votes)
28 views24 pages

Class 12th IP Chapter 2nd

The document discusses pandas and data handling using pandas. It explains that pandas is a Python library used for data analysis and manipulation. It describes pandas core data structures - Series and DataFrame. Series is a one-dimensional array and DataFrame is a two-dimensional structure like a spreadsheet. The document provides examples of how to create, access and manipulate data in Series and DataFrames.

Uploaded by

Harshit Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views24 pages

Class 12th IP Chapter 2nd

The document discusses pandas and data handling using pandas. It explains that pandas is a Python library used for data analysis and manipulation. It describes pandas core data structures - Series and DataFrame. Series is a one-dimensional array and DataFrame is a two-dimensional structure like a spreadsheet. The document provides examples of how to create, access and manipulate data in Series and DataFrames.

Uploaded by

Harshit Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Class 12th

Subject :IP
Chapter 2
Data Handling using pandas

1) NumPy, Pandas and Matplotlib are Python libraries for scientific and analytical use.
2) pip install pandas is the command to install Pandas library.
3) A data structure is a collection of data values and the operations that can be applied to
that data. It enables efficient storage, retrieval and modification to the data.
4) Two main data structures in Pandas library are Series and DataFrame. To use these data
structures, we first need to import the Pandas library.
5) A Series is a one-dimensional array containing a sequence of values. Each value has a
data label associated with it also called its index.
6) The two common ways of accessing the elements of a series are Indexing and Slicing.
7) There are two types of indexes: positional index and labelled index. Positional index takes
an integer value that corresponds to its position in the series starting from 0, whereas
labelled index takes any user-defined label as index.
8) When positional indices are used for slicing, the value at end index position is excluded,
i.e., only (end - start) number of data values of the series are extracted. However with
labelled indexes the value at the end index label is also included in the output.
9) All basic mathematical operations can be performed on Series either by using the operator
or by using appropriate methods of the Series object.
10) While performing mathematical operations index matching is implemented and if no
matching indexes are found during alignment, Pandas returns NaN so that the operation
does not fail.
11) A DataFrame is a two-dimensional labeled data structure like a spreadsheet. It contains
rows and columns and therefore has both a row and column index.
12) When using a dictionary to create a DataFrame, keys of the Dictionary become the column
labels of the DataFrame. A DataFrame can be thought of as a dictionary of lists/ Series (all
Series/columns sharing the same index label for a row).
13) Data can be loaded in a DataFrame from a file on the disk by using Pandas read_csv
function.
14) Data in a DataFrame can be written to a text file on disk by using the
pandas.DataFrame.to_csv() function.
15) DataFrame.T gives the transpose of a DataFrame.
16) Pandas haves a number of methods that support label based indexing but every label
asked for must be in the index, or a KeyError will be raised.
17) DataFrame.loc[ ] is used for label based indexing of rows in DataFrames.
18) Pandas.DataFrame.append() method is used to merge two DataFrames.
19) Pandas supports non-unique index values. Only if a particular operation that does not
support duplicate index values is attempted, an exception is raised at that time.
20) The basic difference between Pandas Series and NumPy ndarray is that operations
between Series automatically align the data based on labels. Thus, we can write
computations without considering whether all Series involved have the same label or not
whereas in case of ndarrays it raises an error.
Very short answer type questions:

1. What is python?
Python is a very popular and easy to learn programming language, created by Guido van
Rossum in 1991. It is used in a variety of fields, including software development, web
development, scientific computing, big dataand Artificial Intelligence. The programs given in this
book are written using Python.
1. What is program?

Set of instructions or commands to be executed by a computer is called a program.

2. What is Software?

Software is a set of programs, which is designed to perform a well-defined function. A program is a


sequence of instructions written to solve a particular problem.

There are two types of software −

 System Software
 Application Software

3. What is programming language?

The language used to specify those set of instructions to the computer is called a programming
language. for example Python, C, C++, Java, etc.

4. What is Function?
A function is a block of code which only runs when it is called. You can pass data, known as
parameters, into a function. A function can return data as a result.
5. What is Variable ?
Variable is a reserved memory location to store value. When you create variable your reserve
some space in memory.
6. What is Array?
An array is a special variable, which can hold more than one value at a time. Arrays are used to
store multiple values in one single variable:
7. What is numpy Array?
NumPy stands for numeric python which is a python package for the computation and
processing of the multidimensional and single dimensional array elements.
8. What is ndarray?
Ndarray is the n-dimensional array object defined in the numpy which stores the collection of
the similar type of elements.
9. What is index and axes attribute?
The axes attribute of DataFrame class contains both the row axis index and the column axis
index. The ndim attribute returns the number of dimensions, which is 2 for a DataFrame instance.
The shape attribute has the shape of the 2 dimensional matrix/DataFrame as a tuple.
10. What is re-indexing?
Reindexing in Pandas can be used to change the index of rows and columns of a DataFrame.
Indexes can be used with reference to many index DataStructure associated with several pandas
series or pandas DataFrame.
11. What is CSV (Comma Separated Values) file
A CSV (Comma Separated Values) format is one of the most simple and common ways to store
tabular data. To represent a CSV file, it must be saved with the .csv file extension.
12. Write the parameters of series in pandas.
A pandas Series is a one-dimensional labelled data structure which can hold data such as
strings, integers and even other Python objects. It is built on top of numpy array and is the
primary data structure to hold one-dimensional data in pandas.
o data: It can be any list, dictionary, or scalar value.
o Index values must be unique and hashable, same length as data. Default np.arrange(n) if
no index is passed..
o dtype: It refers to the data type of series.
o copy: It is used for copying the data.
13. How to install pandas using pip?
Here is the how-to to install Pandas for Windows:
1. Install Python
2. Type in the command “pip install manager”
3. Once finished, type the following:
pip install pandas

14. What is a DataFrame and how is it different from a 2-D array?

A DataFrame is a two-dimensional labelled data structure like a table of MySQL. It contains rows
and columns, and therefore has both a row and column index.
a 2-dimensional array: you have rows and columns. The rows are indicated as the “axis 0”, while
the columns are the “axis 1”.

15. How are DataFrames related to Series?

Dataframe and series both are data structures from the Pandas library.Series is a one-
dimensional structure whereas Dataframe is a two-dimensional structure.

16. What do you understand by the size of (i) a Series, (ii) a DataFrame?

Size of series : the number of values in the Series object


Size of dataframe : : the number of values in the dataframe object

17. What is a Series and how is it different from a 1-D array, a list and a dictionary?

A Series is a one-dimensional array containing a sequence of values of any data type (int,
float, list, string, etc) which by default have numeric data labels starting from zero
Pandas Series a bit like a specialization of a Python dictionary. A dictionary is a structure
that maps arbitrary keys to a set of arbitrary values, and a Series is a structure which maps
typed keys to a set of typed values.
Series is a 1D data structure designed for a particular use case which is quite different from
a list. Yet they both are 1D, ordered data structures. In Series we can change index but we
cannot do in list.

Short answer type questions:

Q.1 What is Python the programming language?

Python is a high-level, general-purpose and a very popular programming language. Python


programming language is being used in web development, Machine Learning applications, along
with all cutting edge technology in Software Industry. Python Programming Language is very well
suited for Beginners, also for experienced programmers with other programming languages like
C++ and Java.
Below are some facts about Python Programming Language:
1. Python is currently the most widely used multi-purpose, high-level programming language.
2. Python allows programming in Object-Oriented and Procedural paradigms.
3. Python programs generally are smaller than other programming languages like Java.
Programmers have to type relatively less and indentation requirement of the language,
makes them readable all the time.
4. Python language is being used by almost all tech-giant companies like – Google, Amazon,
Facebook, Instagram, Dropbox, Uber… etc.
5. The biggest strength of Python is huge collection of standard library which can be used for
the following:
 Machine Learning
 GUI Applications (like Kivy, Tkinter, PyQt etc. )
 Web frameworks like Django (used by YouTube, Instagram, Dropbox)
 Image processing (like OpenCV, Pillow)
 Web scraping (like Scrapy, BeautifulSoup, Selenium)
 Test frameworks
 Multimedia
 Scientific computing
 Text processing and many more..

Q.2 Give the name of any five python application.


1. Web Development
2. Game Development
3. Scientific and Numeric Applications
4. Artificial Intelligence and Machine Learning
5. Desktop GUI
Q.3 Write the name of different python libraries.
A Python library is a collection of related modules. It contains bundles of code that can be used
repeatedly in different programs. It makes Python Programming simpler and convenient for the
programmer. As we don’t need to write the same code again and again for different programs.
Python libraries play a very vital role in fields of Machine Learning, Data Science, Data
Visualization, etc.
The Python Library contains the exact syntax, semantics, and tokens of Python. It contains built-
in modules that provide access to basic system functionality like I/O and some other core
modules. Most of the Python Libraries are written in the C programming language. The Python
standard library consists of more than 200 core modules. All these work together to make Python
a high-level programming language. Python Standard Library plays a very important role. Without
it, the programmers can’t have access to the functionalities of Python. But other than this, there
are several other libraries in Python that make a programmer’s life easier. Let’s have a look at
some of the commonly used libraries:
1. TensorFlow: This library was developed by Google in collaboration with the Brain Team. It is
an open-source library used for high-level computations. It is also used in machine learning
and deep learning algorithms. It contains a large number of tensor operations. Researchers
also use this Python library to solve complex computations in Mathematics and Physics.
2. Matplotlib: This library is responsible for plotting numerical data. And that’s why it is used in
data analysis. It is also an open-source library and plots high-defined figures like pie charts,
histograms, scatterplots, graphs, etc.
3. Pandas: Pandas are an important library for data scientists. It is an open-source machine
learning library that provides flexible high-level data structures and a variety of analysis tools.
It eases data analysis, data manipulation, and cleaning of data. Pandas support operations
like Sorting, Re-indexing, Iteration, Concatenation, Conversion of data, Visualizations,
Aggregations, etc.
4. Numpy: The name “Numpy” stands for “Numerical Python”. It is the commonly used library.
It is a popular machine learning library that supports large matrices and multi-dimensional
data. It consists of in-built mathematical functions for easy computations. Even libraries like
TensorFlow use Numpy internally to perform several operations on tensors. Array Interface is
one of the key features of this library.
5. SciPy: The name “SciPy” stands for “Scientific Python”. It is an open-source library used for
high-level scientific computations. This library is built over an extension of Numpy. It works
with Numpy to handle complex computations. While Numpy allows sorting and indexing of
array data, the numerical data code is stored in SciPy. It is also widely used by application
developers and engineers.
6. Scrapy: It is an open-source library that is used for extracting data from websites. It provides
very fast web crawling and high-level screen scraping. It can also be used for data mining
and automated testing of data.
7. Scikit-learn: It is a famous Python library to work with complex data. Scikit-learn is an open-
source library that supports machine learning. It supports variously supervised and
unsupervised algorithms like linear regression, classification, clustering, etc. This library
works in association with Numpy and SciPy.
8. PyGame: This library provides an easy interface to the Standard Directmedia Library (SDL)
platform-independent graphics, audio, and input libraries. It is used for developing video
games using computer graphics and audio libraries along with Python programming
language.
9. PyTorch: PyTorch is the largest machine learning library that optimizes tensor computations.
It has rich APIs to perform tensor computations with strong GPU acceleration. It also helps to
solve application issues related to neural networks.
10. PyBrain: The name “PyBrain” stands for Python Based Reinforcement Learning, Artificial
Intelligence, and Neural Networks library. It is an open-source library built for beginners in
the field of Machine Learning. It provides fast and easy-to-use algorithms for machine
learning tasks. It is so flexible and easily understandable and that’s why is really helpful for
developers that are new in research fields.

Q.4 What is pandas?

Pandas is an open source Python package that is most widely used for data science/data
analysis and machine learning tasks. It is built on top of another package named Numpy, which
provides support for multi-dimensional arrays. As one of the most popular data wrangling
packages, Pandas works well with many other data science modules inside the Python
ecosystem, and is typically included in every Python distribution, from those that come with your
operating system to commercial vendor distributions like ActiveState’s ActivePython.
This library is built on top of the NumPy library. Pandas is fast and it has high performance &
productivity for users.

Q.5 What is series?

The Pandas Series can be defined as a one-dimensional array that is capable of storing various
data types. We can easily convert the list, tuple, and dictionary into series using "series' method.
The row labels of series are called the index. A Series cannot contain multiple columns. It has the
following parameter:
o data: It can be any list, dictionary, or scalar value.
o index: The value of the index should be unique and hashable. It must be of the same
length as data. If we do not pass any index, default np.arrange(n) will be used.
o dtype: It refers to the data type of series.
o copy: It is used for copying the data.
1) Creating a Series:
We can create a Series in two ways:
1. Create an empty Series
2. Create a Series using inputs.
2) Create an Empty Series:
We can easily create an empty series in Pandas which means it will not have any value.
The syntax that is used for creating an Empty Series:
1. <series object> = pandas.Series()
The below example creates an Empty Series type object that has no values and having default
datatype, i.e., float64.
Example
1. import pandas as pd
2. x = pd.Series([])
3. print (x)
Output : Series([], dtype: float64)
Creating a Series using inputs:
We can create Series by using various inputs:
o Array
o Dict
o Scalar value
3) Creating Series from Array:
Before creating a Series, firstly, we have to import the numpy module and then use array()
function in the program. If the data is ndarray, then the passed index must be of the same length.
If we do not pass an index, then by default index of range(n) is being passed where n defines the
length of an array, i.e., [0,1,2,....range(len(array))-1].
import pandas as pd
import numpy as np
info = np.array(['P','a','n','d','a','s'])
a = pd.Series(info)
print(a)
Output :
0 P
1 a
2 n
3 d
4 a
5 s
dtype: object
4) Create a Series from dict
We can also create a Series from dict. If the dictionary object is being passed as an input
and the index is not specified, then the dictionary keys are taken in a sorted order to
construct the index.
If index is passed, then values correspond to a particular label in the index will be extracted from
the dictionary.
import pandas as pd
import numpy as np
info = {'x' : 0., 'y' : 1., 'z' : 2.}
a = pd.Series(info)
print (a)
Output:
x 0.0
y 1.0
z 2.0
dtype: float64
5) Create a Series using Scalar:
If we take the scalar values, then the index must be provided. The scalar value will be repeated
for matching the length of the index.
1. #import pandas library
2. import pandas as pd
3. import numpy as np
4. x = pd.Series(4, index=[0, 1, 2, 3])
5. print (x)
Output:
0 4
1 4
2 4
3 4
dtype: int64

Q.6 What is Head() and Tail() ?

Head function : The head function in Python displays the first five rows of the dataframe by
default. It takes in a single parameter: the number of rows. We can use this parameter to display
the number of rows of our choice.
Syntax
The head function is defined as follows:
dataframe.head(N)
N refers to the number of rows. If no parameter is passed, the first five rows are returned.
The head function also supports negative values of N. In that case, all rows except the last N
rows are returned.
Example
The code snippet below shows how the head function is used in pandas:

import pandas as pd
# Creating a dataframe
import pandas as pd
# Creating a dataframe
df = pd.DataFrame({'Days': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday']})
print(df) # By default
print('\n ************************************')
print(df.head()) # By default
print('\n ************************************')
print(df.head(3)) # Printing first 3 rows
print('\n ************************************')
print(df.head(-2)) # Printing all except the last 2 rows
Days
0 Monday
1 Tuesday
2 Wednesday
3 Thursday
4 Friday
5 Saturday
6 Sunday

************************************
Days
0 Monday
1 Tuesday
2 Wednesday
3 Thursday
4 Friday

************************************
Days
0 Monday
1 Tuesday
2 Wednesday

************************************
Days
0 Monday
1 Tuesday
2 Wednesday
3 Thursday
4 Friday

Tail function : The tail function in Python displays the last five rows of the dataframe by
default. It takes in a single parameter: the number of rows. We can use this parameter to display
the number of rows of our choice.
Syntax
The tail function is defined as follows:
dataframe.tail(N)
N refers to the number of rows. If no parameter is passed, the first last rows are returned.
The tail function also supports negative values of N. In that case, all rows except the first N rows
are returned.
Example
The code snippet below shows how the tail function is used in pandas:
import pandas as pd
# Creating a dataframe
df = pd.DataFrame({'Days': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
'Sunday']})
print(df) # By default
print('\n ************************************')
print(df.tail()) # By default
print('\n ************************************')
print(df.tail(3)) # Printing first 3 rows
print('\n ************************************')
print(df.tail(-2)) # Printing all except the last 2 rows
Output
Days
0 Monday
1 Tuesday
2 Wednesday
3 Thursday
4 Friday
5 Saturday
6 Sunday

************************************
Days
2 Wednesday
3 Thursday
4 Friday
5 Saturday
6 Sunday

************************************
Days
4 Friday
5 Saturday
6 Sunday

************************************
Days
2 Wednesday
3 Thursday
4 Friday
5 Saturday
6 Sunday

The illustration below summarizes head and tail functions in pandas:


Q.7. What is Data Frame?

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data


structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three
principal components, the data, rows, and columns.
Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns
pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

The parameters of the constructor are as follows −


Sr.No Parameter & Description

1 data
data takes various forms like ndarray, series, map, lists,
dict, constants and also another DataFrame.

2 index
For the row labels, the Index to be used for the resulting
frame is Optional Default np.arange(n) if no index is
passed.

3 columns
For column labels, the optional default syntax is -
np.arange(n). This is only true if no index is passed.

4 dtype
Data type of each column.

5 copy
This command (or whatever it is) is used for copying of
data, if the default is False.
Create DataFrame
A pandas DataFrame can be created using various inputs like −
 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame using these
inputs.
1) Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe.
Example
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print (df)

Live Demo
output is as follows −
Empty DataFrame
Columns: []
Index: []

2) Create a DataFrame from Lists


The DataFrame can be created using a single list or a list of lists.
Example 1
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)
Live Demo
output is as follows −
0
0 1
1 2
2 3
3 4
4 5
Example 2
import pandas as pd
data = [['Apples',10],['Mangos',12],['Bananas',13]]
df = pd.DataFrame(data,columns=['Fruits','Quantity'])
print (df)

output is as follows −

Fruits Quantity
0 Apples 10
1 Mangos 12
2 Bananas 13

3) Create a DataFrame from Dict of ndarrays / Lists

All the ndarrays must be of same length. If index is passed, then the length of the index should
equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.
Example 1
import pandas as pd
data = {'Fruits':['Apples', 'Mangos', 'Bananas', 'Ricky'],'Quantity':[28,34,29,42]}
df = pd.DataFrame(data)
print (df)

Fruits Quantity
0 Apples 28
1 Mangos 34
2 Bananas 29
3 Ricky 42
Live Demo
Example 2
Let us now create an indexed DataFrame using arrays.
import pandas as pd
data = {'Fruits':['Apples', 'Mangos', 'Bananas', 'Ricky'],'Quantity':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print (df)
output is as follows –
Fruits Quantity
rank1 Apples 28
rank2 Mangos 34
rank3 Bananas 29
rank4 Ricky 42

4) Create a DataFrame from List of Dicts

List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys are
by default taken as column names.
Example 1
The following example shows how to create a DataFrame by passing a list of dictionaries.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print (df)
Its output is as follows −
a b c
0 1 2 NaN
1 5 10 20.0

5) Create a DataFrame from Dict of Series

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all
the series indexes passed.
Example

import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

print (df)
output is as follows −
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Let us now understand column selection, addition, and deletion through examples.
1) Column Selection
We will understand this by selecting a column from the DataFrame.
Example
Live Demo
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print df ['one']
Its output is as follows −
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
2) Column Addition
We will understand this by adding a new column to an existing data frame.
Example
Live Demo
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing new series

print ("Adding a new column by passing as Series:")


df['three']=pd.Series([10,20,30],index=['a','b','c'])
print df

print ("Adding a new column using the existing columns in DataFrame:")


df['four']=df['one']+df['three']

print df
Its output is as follows −
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN

Adding a new column using the existing columns in DataFrame:


one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN
3) Column Deletion
Columns can be deleted or popped; let us take an example to understand how.
Example
Live Demo
# Using the previous DataFrame, we will delete a column
# using del function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print df

# using del function


print ("Deleting the first column using DEL function:")
del df['one']
print df

# using pop function


print ("Deleting another column using POP function:")
df.pop('two')
print df
Its output is as follows −
Our dataframe is:
one three two
a 1.0 10.0 1
b 2.0 20.0 2
c 3.0 30.0 3
d NaN NaN 4
Deleting the first column using DEL function:
three two
a 10.0 1
b 20.0 2
c 30.0 3
d NaN 4

Deleting another column using POP function:


three
a 10.0
b 20.0
c 30.0
d NaN
4) Row Selection, Addition, and Deletion
We will now understand row selection, addition and deletion through examples. Let us begin with
the concept of selection.
Selection by Label
Rows can be selected by passing row label to a loc function.
Live Demo
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print df.loc['b']
Its output is as follows −
one 2.0
two 2.0
Name: b, dtype: float64
The result is a series with labels as column names of the DataFrame. And, the Name of the
series is the label with which it is retrieved.
Selection by integer location
Rows can be selected by passing integer location to an iloc function.
Live Demo
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print df.iloc[2]
Its output is as follows −
one 3.0
two 3.0
Name: c, dtype: float64
Slice Rows
Multiple rows can be selected using ‘ : ’ operator.
Live Demo
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print df[2:4]
Its output is as follows −
one two
c 3.0 3
d NaN 4
5) Addition of Rows
Add new rows to a DataFrame using the append function. This function will append the rows at
the end.
Live Demo
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])


df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)
print df
Its output is as follows −
a b
0 1 2
1 3 4
0 5 6
1 7 8
6) Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple
rows will be dropped.
If you observe, in the above example, the labels are duplicate. Let us drop a label and will see
how many rows will get dropped.
Live Demo
import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])


df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df = df.append(df2)

# Drop rows with label 0


df = df.drop(0)

print df
Its output is as follows −
ab
134
178

Q.8. Write the syntax of pandas Data Frame.

pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
The parameters of the constructor are as follows −
Sr.No Parameter & Description

1 data
data takes various forms like ndarray, series, map, lists,
dict, constants and also another DataFrame.

2 index
For the row labels, the Index to be used for the resulting
frame is Optional Default np.arange(n) if no index is
passed.

3 columns
For column labels, the optional default syntax is -
np.arange(n). This is only true if no index is passed.

4 dtype
Data type of each column.

5 copy
This command (or whatever it is) is used for copying of
data, if the default is False.
Q.9. What is use of lower() and upper()?

To convert a Python string to uppercase, use the built-in upper() method of a string. To convert a
Python string to lowercase, use the built-in lower() method.
upper() method on a string converts all of the characters to uppercase, whereas the
lower() method converts all of the characters to lowercase..

Example : Convert a string to uppercase

message = 'python is fun'

# convert message to uppercase


print(message.upper())

# Output: PYTHON IS FUN

Example 1: Convert a string to lowercase


# example string
string = "THIS SHOULD BE LOWERCASE!"
print(string.lower())

# string with numbers


# all alphabets should be lowercase
string = "Th!s Sh0uLd B3 L0w3rCas3!"
print(string.lower())
Run Code
Output

this should be lowercase!


th!s sh0uld b3 l0w3rcas3!

Q.10. What is iteration?

The behavior of basic iteration over Pandas objects depends on the type. When iterating over a
Series, it is regarded as array-like, and basic iteration produces the values. Other data structures,
like DataFrame and Panel, follow the dict-like convention of iterating over the keys of the
objects.
In short, basic iteration (for i in object) produces −
 Series − values
 DataFrame − column labels
 Panel − item labels
Iterating a DataFrame
Iterating a DataFrame gives column names. Let us consider the following example to understand
the same.
Live Demo
import pandas as pd
import numpy as np

N=20
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10, size=(N)).tolist()
})

for col in df:


print (col)
Its output is as follows −
A
C
D
x
y
To iterate over the rows of the DataFrame, we can use the following functions −
 iteritems() − to iterate over the (key,value) pairs
 iterrows() − iterate over the rows as (index,series) pairs
 itertuples() − iterate over the rows as namedtuples
iteritems()
Iterates over each column as key, value pair with label as key and column value as a Series
object.
Live Demo
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
for key,value in df.iteritems():
print (key,value)
Its output is as follows −
col1 0 0.802390
1 0.324060
2 0.256811
3 0.839186
Name: col1, dtype: float64

col2 0 1.624313
1 -1.033582
2 1.796663
3 1.856277
Name: col2, dtype: float64

col3 0 -0.022142
1 -0.230820
2 1.160691
3 -0.830279
Name: col3, dtype: float64

Q. 11 Difference between Pandas Series and NumPy Arrays

Pandas Series NumPy Arrays


In series we can define our own labeled NumPy arrays are accessed by their
index to access elements of an array. integer position using numbers only
These can be numbers or letters.
The elements can be indexed in The indexing starts with zero for the
descending order also. first element and the index is fixed.
If two series are not aligned, NaN or There is no concept of NaN values and
missing values are generated if there are no matching values in
arrays, alignment fails.
Series require more memory. NumPy occupies lesser memory

Q. 12 Write the difference between in pandas and numpy

Property Series DataFrame


Dimensions 1-dimensional 2-dimensional
Type of Homogeneous, i.e., all the elements Heterogeneous, i.e. DataFrame
data must be of same data type in a object can have elements of different
Series object. data types.

Mutability Value mutable, i.e., their element’s Value mutable


value cab change Size-mutable
Size-immutable
Q. 12 Importing and Exporting Data between CSV Files and DataFrames.

We can create a DataFrame by importing data from CSV files where values are separated by
commas. Similarly, we can also store or export data in a DataFrame as a .csv file.
Importing a CSV file to a DataFrame
Let us assume that we have the following data in a csv file named ResultData.csv stored in the
folder C:/NCERT. In order to practice the code while we progress, you are suggested to create
this csv file using a spreadsheet and save in your computer.
RollNo Name Eco Maths
1 Arnab 18 57
2 Kritika 23 45
3 Divyam 51 37
4 Vivaan 40 60
5 Aaroosh 18 27
We can load the data from the ResultData.csv file into a DataFrame, say marks using Pandas
read_csv()
function as shown below:
>>> marks = pd.read_csv("C:/NCERT/ResultData.
csv",sep =",", header=0)
>>> marks

RollNo Name Eco Maths


0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27

Q. 13 Explain concat() function with syntax.

Concatenating means obtaining a new string that contains both of the original strings. In
Python pandas, there are a few ways to concatenate or combine strings. The new string that
is created is referred to as a string object. In order to merge two strings into a single object.
Syntax: concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort,
copy)
Example:
import pandas as pd
df1=pd.DataFrame({'A':['A1','A2','A3'],'B':['B1','B2','B3']},index=[0,1,2])
df2=pd.DataFrame({'A':['A4','A5','A6'],'B':['B4','B5','B6']},index=[3,4,5])
df3=pd.DataFrame({'A':['A7','A8','A9'],'B':['B7','B8','B9']},index=[6,7,8])
dfram=[df1,df2,df3]
result=pd.concat(dfram)
print('Frist series\n',df1)
print('Second series\n',df2)
print('Third series\n',df3)
print('Concat series\n',result)
Output :
Frist series
A B
0 A1 B1
1 A2 B2
2 A3 B3
Second series
A B
3 A4 B4
4 A5 B5
5 A6 B6
Third series
A B
6 A7 B7
7 A8 B8
8 A9 B9
Concat series
A B
0 A1 B1
1 A2 B2
2 A3 B3
3 A4 B4
4 A5 B5
5 A6 B6
6 A7 B7
7 A8 B8
8 A9 B9

You might also like