Data Structures in Pandas

Last Updated : 27 May, 2025

Pandas is an open-source Python library used for working with relational or labeled data in an easy and intuitive way. It provides powerful data structures and a wide range of operations for manipulating numerical data and time series. Pandas also offers tools for cleaning, processing and analyzing data efficiently. It is one of the most popular libraries for data analysis in Python and primarily supports two core data structures:

Series

A Series is a one-dimensional array-like object that can store any data type such as integers, strings, floats, or even Python objects. It comes with labels (called an index).

Syntax

pandas.Series(data=None, index=None, dtype=None, name=None, copy=False)

Parameters:

data: Array-like, dict or scalar – Input data.
index (Optional): Labels for the axis.
dtype (Optional): Data type of the Series.
name (Optional): Name of the Series.
copy (Bool): Copy data if True.

Returns: A pandas.Series object containing the provided data with an associated index.

Example 1: Series holding the char data type.

Python

import pandas as pd
a = ['g', 'e', 'e', 'k', 's']

res = pd.Series(a)
print(res)

Output

Explanation: We pass the list a into pd.Series(a), which converts it into a Series (a column-like structure) where each item gets a default index starting from 0, automatically assigned by Pandas.

Example 2: Series holding the Int data type.

Python

import pandas as pd
a = [1,2,3,4,5]
  
res = pd.Series(a)
print(res)

Output

Explanation: We pass the list a into pd.Series a, which converts it into a Series (a column-like structure) where each number gets a default index starting from 0, automatically assigned by Pandas.

Example 3: Series holding the dictionary.

Python

import pandas as pd
a = { 'Id': 1013, 'Name': 'MOhe', 'State': 'Maniput','Age': 24}

res = pd.Series(a)
print(res)

Output

Explanation: We pass the dictionary a into pd.Series(a), converting keys into index labels and values into data, creating a labeled Series for easy access.

Dataframe

A DataFrame is a two-dimensional, size-mutable and heterogeneous tabular data structure with labeled rows and columns, similar to a spreadsheet or SQL table. Each column in a DataFrame is a Pandas Series, allowing you to work with multiple types of data in one table.

Syntax:

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Parameters:

data: Various forms of input data (e.g., lists, dict, ndarray, Series, another DataFrame).
index( Optional): labels for rows.
columns(Optional): labels for columns.
dtype(Optional): Optional data type for all columns.
copy(Optional): Boolean; whether to copy data or not.

Returns: A pandas.DataFrame object representing a 2D labeled data structure.

Example 1: Creating a dataFrame from a list

Python

import pandas as pd
a = ['Python', 'Pandas', 'Numpy']

df = pd.DataFrame(a, columns=['Tech'])
print(df)

Output

Explanantion: We pass the list a into pd.DataFrame(a, columns=['Tech']), which converts it into a DataFrame with a single column named 'Tech'. Each item becomes a row and Pandas automatically assigns a default integer index starting from 0.

Example 2: Creating a dataFrame from a dictionary

Python

a = {
    'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
    'Age': [20, 21, 19, 18]
}
res = pd.DataFrame(a)
print(res)

Output

Explanation: We pass the dictionary a into pd.DataFrame(a), which converts it into a DataFrame where the dictionary keys become column names and the values (lists) become the column data. Pandas assigns a default integer index starting from 0 for the rows.

Example 3: Selecting columns and rows in a dataFrame

Python

import pandas as pd 
 
a = {
    'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 
    'Age': [27, 24, 22, 32], 
    'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'], 
    'Qualification': ['Msc', 'MA', 'MCA', 'Phd']
} 
df = pd.DataFrame(a) 
print(df[['Name', 'Qualification']])

Output

Explanation: We create a DataFrame df from the dictionary a, then select and print only the columns 'Name' and 'Qualification' by passing their names in a list to df[]. This returns a new DataFrame with just those two columns.

Accessing columns and rows in a dataFrame

A DataFrame in Pandas is a 2D tabular structure where you can easily access and manipulate data by selecting specific columns or rows. You can extract one or more columns using column names and filter rows using labels or conditions.

Example 1: We can access one or more columns in a DataFrame using square brackets.

Python

import pandas as pd
a = {
    'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
    'Age': [27, 24, 22, 32],
    'City': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj']
}
df = pd.DataFrame(a)

print(df['Name']) # single column
print(df[['Name', 'City']]) # multiple columns

Output

Explanation:

df['Name'] returns a Series containing values from the 'Name' column.
df[['Name', 'City']] returns a new DataFrame containing only the specified columns.

Example 2: We can use .loc[] to access rows by index or filter them using conditions.

Python

import pandas as pd

a = {
    'Name': ['Mohe', 'Shyni', 'Parul', 'Sam'],
    'ID': [12, 43, 54, 32],
    'City': ['Delhi', 'Kochi', 'Pune', 'Patna']
}

df = pd.DataFrame(a)
res = df.loc[df['Name'] == 'Mohe']
print(res)