0% found this document useful (0 votes)
15 views12 pages

Battle of The Data Tools - Pandas Vs SQL

Notes

Uploaded by

anikethk32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

Battle of The Data Tools - Pandas Vs SQL

Notes

Uploaded by

anikethk32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Vs

Battle of the Data


Tools:
Pandas vs SQL

POOJA T
Why Compare Pandas
and SQL?
Both are popular tools in
data analysis with unique
strengths.
Understanding their syntax
and capabilities helps in
selecting the right tool.
1.Loading Data
Pandas

import pandas as pd
# Load data from a CSV file
df = pd.read_csv('data.csv')

SQL

-- Load data from a database


table
SELECT * FROM data_table;
2.Filtering Data
Pandas

# Filter rows where 'age' is


greater than 30
filtered_df = df[df['age'] > 30]

SQL

-- Filter rows where 'age' is


greater than 30
SELECT * FROM data_table
WHERE age > 30;
3.Aggregating Data
Pandas
# Calculate the average age for
each gender
avg_age = df.groupby('gender')
['age'].mean()

SQL

-- Calculate the average age


for each gender
SELECT gender, AVG(age)
FROM data_table
GROUP BY gender;
4.Joining Data
Pandas
# Merge two DataFrames on
'id'
merged_df = pd.merge(df1, df2,
on='id')

SQL

-- Join two tables on 'id'


SELECT * FROM table1
JOIN table2 ON table1.id =
table2.id;
5.Data Transformation
Pandas

Create a new column 'total' as


the sum of 'price' and 'tax'
df['total'] = df['price'] + df['tax']

SQL

-- Add a new column 'total' as


the sum of 'price' and 'tax'
SELECT price, tax, (price + tax)
AS total FROM data_table;
6.Sorting Data
Pandas
# Sort DataFrame by 'age' in
descending order
sorted_df = df.sort_values
(by='age', ascending=False)

SQL

-- Sort table by 'age' in


descending order
SELECT * FROM data_table
ORDER BY age DESC;
7.Handling Missing Data

Pandas

# Fill missing values in


'column_name' with the mean
df['column_name'].fillna
(df['column_name'].mean(),
inplace=True)
SQL

-- Handle missing values by


replacing them with the
mean (using COALESCE)
SELECT
COALESCE(column_name,
AVG(column_name) OVER())
AS column_name FROM
data_table;
Which One Should You
Use?
Use Pandas for flexibility,
ease of use, and integration
with Python.
Use SQL for efficient
querying and handling large
datasets in databases.
Both can be used together
for a powerful data analysis
workflow.
I hope this information
serves you well

Follow for more tips and tutorials on


data analysis.

You might also like