0% found this document useful (0 votes)
12 views11 pages

Getting Start With Pandas

The document provides an introduction to Pandas, a data manipulation library in Python, detailing its primary data structures: Series and DataFrame. It covers creating Series and DataFrames, data manipulation techniques such as filtering, sorting, and grouping, as well as handling missing data, merging DataFrames, and performing descriptive statistics. Additionally, it includes examples of reading from and writing to various file formats and string manipulation methods.

Uploaded by

abhinav.mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views11 pages

Getting Start With Pandas

The document provides an introduction to Pandas, a data manipulation library in Python, detailing its primary data structures: Series and DataFrame. It covers creating Series and DataFrames, data manipulation techniques such as filtering, sorting, and grouping, as well as handling missing data, merging DataFrames, and performing descriptive statistics. Additionally, it includes examples of reading from and writing to various file formats and string manipulation methods.

Uploaded by

abhinav.mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

UNIVERSITY OF STEEL TECHNOLOGY

AND MANAGEMENT

Introduction to Data Science and


Data Analytics
Presented by:

Dr. Ravindra Singh Saluja

OP Jindal University, Raigarh


UNIVERSITY OF STEEL TECHNOLOGYAND MANAGEMENT
Introduction to Pandas

• Pandas is primarily used for working


with structured data. It provides two
main data structures:
• Series: One-dimensional labeled array
capable of holding any data type.
• DataFrame: Two-dimensional labeled
data structure with columns of
potentially different types.

2
Creating a series
• From a List:
import pandas as pd
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
• From a Dictionary:
import pandas as pd
data = {"a": 1, "b": 2, "c": 3}
series = pd.Series(data)
print(series)
• With Custom Index:
import pandas as pd
data = [10, 20, 30, 40, 50]
series = pd.Series(data, index=["a", "b", "c", "d", "e"])
print(series)
3
Creating a DataFrame

import pandas as pd

# Creating a DataFrame from a dictionary


data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

4
Data Manipulation
Pandas provides a wide range of methods to manipulate data, such as filtering, sorting,
and grouping.
•Filtering: Select rows based on conditions.
# Filtering rows where Age is greater than 28
filtered_df = df[df['Age'] > 28]
print(filtered_df)

• Sorting: Sort the DataFrame by a specific column.


# Sorting by Age in descending order
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)
• Grouping: Group data and perform aggregate functions.
# Grouping by City and calculating the mean age
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)
Handling Missing Data

Pandas makes it easy to handle missing data with methods like fillna() and dropna().
# Filling missing values with a default value
df.fillna(0, inplace=True)

# Dropping rows with any missing values


df.dropna(inplace=True)
Merging and Joining
DataFrames

# Merging two DataFrames on a common column


df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 40]})

merged_df = pd.merge(df1, df2, on='ID', how='inner')


print(merged_df)
Reading and Writing Data

You can read from and write to various file formats like CSV, Excel, and SQL databases.
# Reading from a CSV file
df = pd.read_csv('data.csv')

# Writing to a CSV file


df.to_csv('output.csv', index=False)
String Manipulation

df['Name'] = df['Name'].str.upper() # Convert names to uppercase


df['Name_Length'] = df['Name'].str.len() # Find length of names
df['Name'] = df['Name'].str.replace('A', '@') # Replace 'A' with '@'

9
Descriptive Statistics

• Basic Statistical Measures:


# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5], 'B': [5, 10, 15, 20,
25]}
df = pd.DataFrame(data)
# Descriptive statistics
print(df.describe())

10
• Calculating Specific Statistics:
# Mean
mean = df['A'].mean()
print('Mean:', mean)

# Standard Deviation
std = df['A'].std()
print('Standard Deviation:', std)

# Correlation
correlation = df.corr()
print('Correlation:\n', correlation)

11

You might also like