0% found this document useful (0 votes)
18 views9 pages

Lab 9

The document provides an overview of using Pandas DataFrames in Python, detailing their structure, creation methods, and features such as handling missing data. It includes examples of creating DataFrames from various data types, adding and deleting columns, and iterating over rows and columns. Additionally, it covers techniques for managing missing data using functions like isnull(), fillna(), and dropna().

Uploaded by

Prince Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views9 pages

Lab 9

The document provides an overview of using Pandas DataFrames in Python, detailing their structure, creation methods, and features such as handling missing data. It includes examples of creating DataFrames from various data types, adding and deleting columns, and iterating over rows and columns. Additionally, it covers techniques for managing missing data using functions like isnull(), fillna(), and dropna().

Uploaded by

Prince Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

EXPERIMENT NO.

Program Name: Playing with Data-frames using Pandas in Python Programming


Language
Implementation:
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular
fashion in rows and columns. Pandas DataFrame is two-dimensional size-mutable,
potentially heterogeneous tabular data structure with labeled axes (rows and
columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a
tabular fashion in rows and columns. Pandas DataFrame consists of three principal
components, the data, rows, and columns.
Features of DataFrame

 Potentially columns are of different types


 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns
Structure
Let us assume that we are creating a data frame with student’s data.

You can think of it as an SQL table or a spreadsheet data representation.


pandas.DataFrame
A pandas DataFrame can be created using the following constructor −

Prince Yadav IT3 2100270130133


pandas.DataFrame( data, index, columns, dtype, copy)
The parameters of the constructor are as follows −

Sr.N Parameter & Description


o

1 data
data takes various forms like ndarray, series, map, lists, dict, constants and also
another DataFrame.

2 index
For the row labels, the Index to be used for the resulting frame is Optional Default
np.arange(n) if no index is passed.

3 columns
For column labels, the optional default syntax is - np.arange(n). This is only true if
no index is passed.

4 dtype
Data type of each column.

5 copy
This command (or whatever it is) is used for copying of data, if the default is False.

Create DataFrame
A pandas DataFrame can be created using various inputs like −

 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame
using these inputs.

Prince Yadav IT3 2100270130133


Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe.
Example
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print (df)

Create a DataFrame from Lists


The DataFrame can be created using a single list or a list of lists.
Example 1
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)

Example 2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)

Create a DataFrame from Dictionary


Example 1
The following example shows how to create a DataFrame by passing a list of
dictionaries.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print (df)

Example 2
The following example shows how to create a DataFrame by passing a list of
dictionaries and the row indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print (df)

Prince Yadav IT3 2100270130133


Example 3
The following example shows how to create a DataFrame with a list of dictionaries,
row indices, and column indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys


df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print (df1)
print (df2)

Column Addition
We will understand this by adding a new column to an existing data frame.
Example
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

# Adding a new column to an existing DataFrame object with column label by passing
new series

print ("Adding a new column by passing as Series:")


df['three']=pd.Series([10,20,30],index=['a','b','c'])
print (df)

print ("Adding a new column using the existing columns in DataFrame:")


df['four']=df['one']+df['three']

print (df)

Column Deletion
Columns can be deleted or popped; let us take an example to understand how.
Example
# Using the previous DataFrame, we will delete a column

Prince Yadav IT3 2100270130133


# using del function
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),


'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}

df = pd.DataFrame(d)
print ("Our dataframe is:")
print (df)

# using del function


print ("Deleting the first column using DEL function:")
del df['one']
print (df)

# using pop function


print ("Deleting another column using POP function:")
df.pop('two')
print (df)

Working with Missing Data


Missing Data can occur when no information is provided for one or more items or for a
whole unit. Missing Data is a very big problem in real life scenario. Missing Data can
also refer to as NA(Not Available) values in pandas.
Checking for missing values using isnull() and notnull() :
In order to check missing values in Pandas DataFrame, we use a
function isnull() and notnull(). Both function help in checking whether a value
is NaN or not. These function can also be used in Pandas Series in order to find null
values in a series.
# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

Prince Yadav IT3 2100270130133


# creating a dataframe from list
df = pd.DataFrame(dict)

# using isnull() function


Print(df.isnull())

Filling missing values using fillna(), replace() and interpolate() :


In order to fill null values in a datasets, we
use fillna(), replace() and interpolate() function these function replace NaN values with
some value of their own. All these function help in filling a null values in datasets of a
DataFrame. Interpolate() function is basically used to fill NA values in the dataframe
but it uses various interpolation technique to fill the missing values rather than hard-
coding the value.
# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary


df = pd.DataFrame(dict)

# filling missing value using fillna()


Print(df.fillna(0))

Dropping missing values using dropna() :


In order to drop a null values from a dataframe, we used dropna() function this fuction
drop Rows/Columns of datasets with Null values in different ways.
# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists

Prince Yadav IT3 2100270130133


dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary


df = pd.DataFrame(dict)

print(df)

#Now we drop rows with at least one Nan value (Null value)
# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary


df = pd.DataFrame(dict)

# using dropna() function


Print(df.dropna())

Iterating over rows and columns


Iteration is a general term for taking each item of something, one after another.
Pandas DataFrame consists of rows and columns so, in order to iterate over
dataframe, we have to iterate a dataframe like a dictionary.
Iterating over rows :
In order to iterate over rows, we can use three
function iteritems(), iterrows(), itertuples() . These three function will help in iteration
over rows.
# importing pandas as pd
import pandas as pd

Prince Yadav IT3 2100270130133


# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

print(df)

#Now we apply iterrows() function in order to get a each element of rows.


# importing pandas as pd
import pandas as pd

# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

# iterating over rows using iterrows() function


for i, j in df.iterrows():
print(i, j)
print()
Run on IDE

Iterating over Columns :


In order to iterate over columns, we need to create a list of dataframe columns and
then iterating through that list to pull out the dataframe columns.
# importing pandas as pd
import pandas as pd

# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

print(df)

Prince Yadav IT3 2100270130133


Now we iterate through columns in order to iterate through columns we first create a
list of dataframe columns and then iterate through list.
# creating a list of dataframe columns
columns = list(df)

for i in columns:

# printing the third element of the column


print (df[i][2])

Prince Yadav IT3 2100270130133

You might also like