Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns. It can be visualized as an SQL data table or an excel sheet representation.
It can be created using the following constructor −
pd.Dataframe(data, index, columns, dtype, copy)
We previously saw a method in which a new column was created as a Series data structure. This was indexed to the original dataframe and hence got added to the dataframe.
Let us use how we can create a column using the already present columns of the dataframe. This is useful when we need to perform some computation on the already present columns and store their result in a new column −
Example
import pandas as pd my_data = {'ab' : pd.Series([1, 8, 7], index=['a', 'b', 'c']), 'cd' : pd.Series([1, 2, 0, 9], index=['a', 'b', 'c', 'd']), 'ef' :pd.Series([56, 78, 32],index=['a','b','c'])} my_df = pd.DataFrame(my_data) print("The dataframe is :") print(my_df) my_df['gh'] = my_df['ab'] + my_df['ef'] print("After adding column 0 and 2 to the dataframe, :") print(my_df)
Output
The dataframe is : ab cd ef a 1.0 1 56.0 b 8.0 2 78.0 c 7.0 0 32.0 d NaN 9 NaN After adding column 0 and 2 to the dataframe, : ab cd ef gh a 1.0 1 56.0 57.0 b 8.0 2 78.0 86.0 c 7.0 0 32.0 39.0 d NaN 9 NaN NaN
Explanation
The required libraries are imported, and given alias names for ease of use.
Dictionary values consisting of key and value is created, wherein a value is actually a series data structure.
Multiple such dictionary values are created.
This dictionary is later passed as a parameter to the ‘Dataframe’ function present in the ‘pandas’ library
The dataframe is created by passing the dictionary as parameters to it.
A new column is indexed to the dataframe, and the 0th and 2nd column are added to create this new column.
The dataframe is printed on the console.
Note − The word ‘NaN’ refers to ‘Not a Number’, which means that specific [row,col] value doesn’t have any valid entry.