Data Structures in Pandas
Last Updated :
27 May, 2025
Pandas is an open-source Python library used for working with relational or labeled data in an easy and intuitive way. It provides powerful data structures and a wide range of operations for manipulating numerical data and time series. Pandas also offers tools for cleaning, processing and analyzing data efficiently. It is one of the most popular libraries for data analysis in Python and primarily supports two core data structures:
Series
A Series is a one-dimensional array-like object that can store any data type such as integers, strings, floats, or even Python objects. It comes with labels (called an index).
Syntax
pandas.Series(data=None, index=None, dtype=None, name=None, copy=False)
Parameters:
- data: Array-like, dict or scalar – Input data.
- index (Optional): Labels for the axis.
- dtype (Optional): Data type of the Series.
- name (Optional): Name of the Series.
- copy (Bool): Copy data if True.
Returns: A pandas.Series object containing the provided data with an associated index.
Example 1: Series holding the char data type.
Python
import pandas as pd
a = ['g', 'e', 'e', 'k', 's']
res = pd.Series(a)
print(res)
Output
Series OutputExplanation: We pass the list a into pd.Series(a), which converts it into a Series (a column-like structure) where each item gets a default index starting from 0, automatically assigned by Pandas.
Example 2: Series holding the Int data type.
Python
import pandas as pd
a = [1,2,3,4,5]
res = pd.Series(a)
print(res)
Output
Series outputExplanation: We pass the list a into pd.Series a, which converts it into a Series (a column-like structure) where each number gets a default index starting from 0, automatically assigned by Pandas.
Example 3: Series holding the dictionary.
Python
import pandas as pd
a = { 'Id': 1013, 'Name': 'MOhe', 'State': 'Maniput','Age': 24}
res = pd.Series(a)
print(res)
Output
Series OutputExplanation: We pass the dictionary a into pd.Series(a), converting keys into index labels and values into data, creating a labeled Series for easy access.
Dataframe
A DataFrame is a two-dimensional, size-mutable and heterogeneous tabular data structure with labeled rows and columns, similar to a spreadsheet or SQL table. Each column in a DataFrame is a Pandas Series, allowing you to work with multiple types of data in one table.
Syntax:
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Parameters:
- data: Various forms of input data (e.g., lists, dict, ndarray, Series, another DataFrame).
- index( Optional): labels for rows.
- columns(Optional): labels for columns.
- dtype(Optional): Optional data type for all columns.
- copy(Optional): Boolean; whether to copy data or not.
Returns: A pandas.DataFrame object representing a 2D labeled data structure.
Example 1: Creating a dataFrame from a list
Python
import pandas as pd
a = ['Python', 'Pandas', 'Numpy']
df = pd.DataFrame(a, columns=['Tech'])
print(df)
Output
DataFrame OutputExplanantion: We pass the list a into pd.DataFrame(a, columns=['Tech']), which converts it into a DataFrame with a single column named 'Tech'. Each item becomes a row and Pandas automatically assigns a default integer index starting from 0.
Example 2: Creating a dataFrame from a dictionary
Python
a = {
'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
'Age': [20, 21, 19, 18]
}
res = pd.DataFrame(a)
print(res)
Output
DataFrame OutputExplanation: We pass the dictionary a into pd.DataFrame(a), which converts it into a DataFrame where the dictionary keys become column names and the values (lists) become the column data. Pandas assigns a default integer index starting from 0 for the rows.
Example 3: Selecting columns and rows in a dataFrame
Python
import pandas as pd
a = {
'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age': [27, 24, 22, 32],
'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification': ['Msc', 'MA', 'MCA', 'Phd']
}
df = pd.DataFrame(a)
print(df[['Name', 'Qualification']])
Output
Selected ColumnsExplanation: We create a DataFrame df from the dictionary a, then select and print only the columns 'Name' and 'Qualification' by passing their names in a list to df[]. This returns a new DataFrame with just those two columns.
Accessing columns and rows in a dataFrame
A DataFrame in Pandas is a 2D tabular structure where you can easily access and manipulate data by selecting specific columns or rows. You can extract one or more columns using column names and filter rows using labels or conditions.
Example 1: We can access one or more columns in a DataFrame using square brackets.
Python
import pandas as pd
a = {
'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age': [27, 24, 22, 32],
'City': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj']
}
df = pd.DataFrame(a)
print(df['Name']) # single column
print(df[['Name', 'City']]) # multiple columns
Output
Column AccessExplanation:
- df['Name'] returns a Series containing values from the 'Name' column.
- df[['Name', 'City']] returns a new DataFrame containing only the specified columns.
Example 2: We can use .loc[] to access rows by index or filter them using conditions.
Python
import pandas as pd
a = {
'Name': ['Mohe', 'Shyni', 'Parul', 'Sam'],
'ID': [12, 43, 54, 32],
'City': ['Delhi', 'Kochi', 'Pune', 'Patna']
}
df = pd.DataFrame(a)
res = df.loc[df['Name'] == 'Mohe']
print(res)
Output
Filtered RowsExplanation: df.loc[df['Name'] == 'Mohe'] filters and returns only the row(s) where the 'Name' column has the value 'Mohe'.
Related articles
Similar Reads
DataFrame vs Series in Pandas
Pandas is a widely-used Python library for data analysis that provides two essential data structures: Series and DataFrame. These structures are potent tools for handling and examining data, but they have different features and applications. In this article, we will explore the differences between S
7 min read
Pandas dataframe.sort_index()
Pandas is one of those packages and makes importing and analyzing data much easier. When working with DataFrames, Pandas is used for handling tabular data. Let's learn Pandas DataFrame sort_index() method, which is used to sort the DataFrame based on index or column labels.Pandas sort_index() functi
3 min read
Creating a Pandas Series
A Pandas Series is like a single column of data in a spreadsheet. It is a one-dimensional array that can hold many types of data such as numbers, words or even other Python objects. Each value in a Series is associated with an index, which makes data retrieval and manipulation easy. This article exp
3 min read
Python | Pandas Series.data
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas series is a One-dimensional ndarray with axis labels. The labels need not be un
2 min read
Count Values in Pandas Dataframe
Counting values in Pandas dataframe is important for understanding the distribution of data, checking for missing values or summarizing data. In this article, we will learn various methods to count values in a Pandas DataFrameWe will be using below dataframe to learn about various methods:Pythonimpo
3 min read
Data Processing with Pandas
Data Processing is an important part of any task that includes data-driven work. It helps us to provide meaningful insights from the data. As we know Python is a widely used programming language, and there are various libraries and tools available for data processing. In this article, we are going t
10 min read
How to Set Cell Value in Pandas DataFrame?
In this article, we will discuss how to set cell values in Pandas DataFrame in Python. Method 1: Set value for a particular cell in pandas using dataframe.at This method is used to set the value of an existing value or set a new record. Python3 # import pandas module import pandas as pd # create a d
2 min read
Pandas DataFrame index Property
In Pandas we have names to identify columns but for identifying rows, we have indices. The index property in a pandas dataFrame allows to identify and access specific rows within dataset. Essentially, the index is a series of labels that uniquely identify each row in the DataFrame. These labels can
6 min read
Streamlined Data Ingestion with Pandas
Data Ingestion is the process of, transferring data, from varied sources to an approach, where it can be analyzed, archived, or utilized by an establishment. The usual steps, involved in this process, are drawing out data, from its current place, converting the data, and, finally loading it, in a lo
9 min read
Python | Pandas dataframe.info()
The `dataframe.info()` function in Pandas proves to be an invaluable tool for obtaining a succinct summary of a dataframe. This function is particularly useful during exploratory analysis, offering a quick and informative overview of the dataset. Leveraging `dataframe.info()` is an efficient way to
4 min read