0% found this document useful (0 votes)
15 views12 pages

Python Pandas Dataframe: Parameter & Description

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

Python Pandas Dataframe: Parameter & Description

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Python Pandas DataFrame

Pandas DataFrame is a widely used data structure which works with a two-dimensional
array with labeled axes (rows and columns). DataFrame is defined as a standard way to
store data that has two different indexes, i.e., row index and column index. It consists
of the following properties:

o The columns can be heterogeneous types like int, bool, and so on.
o It can be seen as a dictionary of Series structure where both the rows and
columns are indexed. It is denoted as "columns" in case of columns and "index"
in case of rows.

Parameter & Description:


data: It consists of different forms like ndarray, series, map, constants, lists, array.

index: The Default np.arrange(n) index is used for the row labels if no index is passed.

columns: The default syntax is np.arrange(n) for the column labels. It shows only true if
no index is passed.

Play Video

dtype: It refers to the data type of each column.

copy(): It is used for copying the data.


Create a DataFrame
We can create a DataFrame using following ways:

o dict
o Lists
o Numpy ndarrrays
o Series

Create an empty DataFrame

The below code shows how to create an empty DataFrame in Pandas:

1. # importing the pandas library


2. import pandas as pd
3. df = pd.DataFrame()
4. print (df)

Output
Empty DataFrame
Columns: []
Index: []

Explanation: In the above code, first of all, we have imported the pandas library with
the alias pd and then defined a variable named as df that consists an empty
DataFrame. Finally, we have printed it by passing the df into the print.

Create a DataFrame using List:


We can easily create a DataFrame in Pandas using list.

1. # importing the pandas library


2. import pandas as pd
3. # a list of strings
4. x = ['Python', 'Pandas']
5.
6. # Calling DataFrame constructor on list
7. df = pd.DataFrame(x)
8. print(df)

Output

0
0 Python
1 Pandas

Explanation: In the above code, we have defined a variable named "x" that consist of
string values. The DataFrame constructor is being called for a list to print the values.

Create a DataFrame from Dict of ndarrays/ Lists

1. # importing the pandas library


2. import pandas as pd
3. info = {'ID' :[101, 102, 103],'Department' :['B.Sc','B.Tech','M.Tech',]}
4. df = pd.DataFrame(info)
5. print (df)

Output

ID Department
0 101 B.Sc
1 102 B.Tech
2 103 M.Tech

Explanation: In the above code, we have defined a dictionary named "info" that
consist list of ID and Department. For printing the values, we have to call the info
dictionary through a variable called df and pass it as an argument in print().

Create a DataFrame from Dict of Series:

1. # importing the pandas library


2. import pandas as pd
3.
4. info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
5. 'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}
6.
7. d1 = pd.DataFrame(info)
8. print (d1)

Output

one two
a 1.0 1
b 2.0 2
c 3.0 3
d 4.0 4
e 5.0 5
f 6.0 6
g NaN 7
h NaN 8

Explanation: In the above code, a dictionary named "info" consists of two Series with
its respective index. For printing the values, we have to call the info dictionary through a
variable called d1 and pass it as an argument in print().

Column Selection
We can select any column from the DataFrame. Here is the code that demonstrates
how to select a column from the DataFrame.

1. # importing the pandas library


2. import pandas as pd
3.
4. info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),
5. 'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}
6.
7. d1 = pd.DataFrame(info)
8. print (d1 ['one'])

Output

a 1.0
b 2.0
c 3.0
d 4.0
e 5.0
f 6.0
g NaN
h NaN
Name: one, dtype: float64

Explanation: In the above code, a dictionary named "info" consists of two Series with
its respective index. Later, we have called the info dictionary through a variable d1 and
selected the "one" Series from the DataFrame by passing it into the print().

Column Addition
We can also add any new column to an existing DataFrame. The below code
demonstrates how to add any new column to an existing DataFrame:

1. # importing the pandas library


2. import pandas as pd
3.
4. info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
5. 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
6.
7. df = pd.DataFrame(info)
8.
9. # Add a new column to an existing DataFrame object
10.
11. print ("Add new column by passing series")
12. df['three']=pd.Series([20,40,60],index=['a','b','c'])
13. print (df)
14.
15. print ("Add new column using existing DataFrame columns")
16. df['four']=df['one']+df['three']
17.
18. print (df)

Output

Add new column by passing series


one two three
a 1.0 1 20.0
b 2.0 2 40.0
c 3.0 3 60.0
d 4.0 4 NaN
e 5.0 5 NaN
f NaN 6 NaN

Add new column using existing DataFrame columns


one two three four
a 1.0 1 20.0 21.0
b 2.0 2 40.0 42.0
c 3.0 3 60.0 63.0
d 4.0 4 NaN NaN
e 5.0 5 NaN NaN
f NaN 6 NaN NaN

Explanation: In the above code, a dictionary named as f consists two Series with its
respective index. Later, we have called the info dictionary through a variable df.

To add a new column to an existing DataFrame object, we have passed a new series
that contain some values concerning its index and printed its result using print().

We can add the new columns using the existing DataFrame. The "four" column has
been added that stores the result of the addition of the two columns, i.e., one and three.

Column Deletion:
We can also delete any column from the existing DataFrame. This code helps to
demonstrate how the column can be deleted from an existing DataFrame:

1. # importing the pandas library


2. import pandas as pd
3.
4. info = {'one' : pd.Series([1, 2], index= ['a', 'b']),
5. 'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}
6.
7. df = pd.DataFrame(info)
8. print ("The DataFrame:")
9. print (df)
10.
11. # using del function
12. print ("Delete the first column:")
13. del df['one']
14. print (df)
15. # using pop function
16. print ("Delete the another column:")
17. df.pop('two')
18. print (df)

Output

The DataFrame:
one two
a 1.0 1
b 2.0 2
c NaN 3

Delete the first column:


two
a 1
b 2
c 3

Delete the another column:


Empty DataFrame
Columns: []
Index: [a, b, c]

Explanation:

In the above code, the df variable is responsible for calling the info dictionary and print
the entire values of the dictionary. We can use the delete or pop function to delete the
columns from the DataFrame.

In the first case, we have used the delete function to delete the "one" column from the
DataFrame whereas in the second case, we have used the pop function to remove the
"two" column from the DataFrame.

Row Selection, Addition, and Deletion


Row Selection:
We can easily select, add, or delete any row at anytime. First of all, we will understand
the row selection. Let's see how we can select a row using different ways that are as
follows:

Selection by Label:

We can select any row by passing the row label to a loc function.

1. # importing the pandas library


2. import pandas as pd
3.
4. info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
5. 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
6.
7. df = pd.DataFrame(info)
8. print (df.loc['b'])

Output

one 2.0
two 2.0
Name: b, dtype: float64

Explanation: In the above code, a dictionary named as info that consists


two Series with its respective index.

For selecting a row, we have passed the row label to a loc function.

Selection by integer location:

The rows can also be selected by passing the integer location to an iloc function.

1. # importing the pandas library


2. import pandas as pd
3. info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
4. 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
5. df = pd.DataFrame(info)
6. print (df.iloc[3])

Output

one 4.0
two 4.0
Name: d, dtype: float64

Explanation: Explanation: In the above code, we have defined a dictionary named


as info that consists two Series with its respective index.

For selecting a row, we have passed the integer location to an iloc function.

Slice Rows

It is another method to select multiple rows using ':' operator.

1. # importing the pandas library


2. import pandas as pd
3. info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
4. 'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
5. df = pd.DataFrame(info)
6. print (df[2:5])

Output

one two
c 3.0 3
d 4.0 4
e 5.0 5

Explanation: In the above code, we have defined a range from 2:5 for the selection of
row and then printed its values on the console.

Addition of rows:

We can easily add new rows to the DataFrame using append function. It adds the new
rows at the end.

1. # importing the pandas library


2. import pandas as pd
3. d = pd.DataFrame([[7, 8], [9, 10]], columns = ['x','y'])
4. d2 = pd.DataFrame([[11, 12], [13, 14]], columns = ['x','y'])
5. d = d.append(d2)
6. print (d)

Output
x y
0 7 8
1 9 10
0 11 12
1 13 14

Explanation: In the above code, we have defined two separate lists that contains some
rows and columns. These columns have been added using the append function and
then result is displayed on the console.

Deletion of rows:

We can delete or drop any rows from a DataFrame using the index label. If in case, the
label is duplicate then multiple rows will be deleted.

1. # importing the pandas library


2. import pandas as pd
3.
4. a_info = pd.DataFrame([[4, 5], [6, 7]], columns = ['x','y'])
5. b_info = pd.DataFrame([[8, 9], [10, 11]], columns = ['x','y'])
6.
7. a_info = a_info.append(b_info)
8.
9. # Drop rows with label 0
10. a_info = a_info.drop(0)

Output

x y
1 6 7
1 10 11

Explanation: In the above code, we have defined two separate lists that contains some
rows and columns.

Here, we have defined the index label of a row that needs to be deleted from the list.

DataFrame Functions
There are lots of functions used in DataFrame which are as follows:

Functions Description
Pandas DataFrame.append() Add the rows of other dataframe to the end of the given dataframe.

Pandas DataFrame.apply() Allows the user to pass a function and apply it to every single value
of the Pandas series.

Pandas DataFrame.assign() Add new column into a dataframe.

Pandas DataFrame.astype() Cast the Pandas object to a specified dtype.astype() function.

Pandas DataFrame.concat() Perform concatenation operation along an axis in the DataFrame.

Pandas DataFrame.count() Count the number of non-NA cells for each column or row.

Pandas DataFrame.describe() Calculate some statistical data like percentile, mean and std of the
numerical values of the Series or DataFrame.

Pandas Remove duplicate values from the DataFrame.


DataFrame.drop_duplicates()

Pandas DataFrame.groupby() Split the data into various groups.

Pandas DataFrame.head() Returns the first n rows for the object based on position.

Pandas DataFrame.hist() Divide the values within a numerical variable into "bins".

Pandas DataFrame.iterrows() Iterate over the rows as (index, series) pairs.

Pandas DataFrame.mean() Return the mean of the values for the requested axis.

Pandas DataFrame.melt() Unpivots the DataFrame from a wide format to a long format.

Pandas DataFrame.merge() Merge the two datasets together into one.

Pandas DataFrame.pivot_table() Aggregate data with calculations such as Sum, Count, Average,
Max, and Min.

Pandas DataFrame.query() Filter the dataframe.

Pandas DataFrame.sample() Select the rows and columns from the dataframe randomly.

Pandas DataFrame.shift() Shift column or subtract the column value with the previous row
value from the dataframe.

Pandas DataFrame.sort() Sort the dataframe.

Pandas DataFrame.sum() Return the sum of the values for the requested axis by the user.

Pandas DataFrame.to_excel() Export the dataframe to the excel file.


Pandas DataFrame.transpose() Transpose the index and columns of the dataframe.

Pandas DataFrame.where() Check the dataframe for one or more conditions.

Next TopicDataFrame.append()

You might also like