0% found this document useful (0 votes)
36 views

Pandas Vs SQL

Uploaded by

ucdavispikachu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Pandas Vs SQL

Uploaded by

ucdavispikachu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Data Engineering 101

Pandas vs
SQL

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Select all rows

Pandas

df

SQL

SELECT * FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Select specific
columns

Pandas

df[['col1', 'col2']]

SQL

SELECT col1, col2


FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Filter rows based on a


condition

Pandas

df[df['col1'] > 5]

SQL

SELECT *
FROM table
WHERE col1 > 5;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Filter rows based on


multiple conditions

Pandas

df[(df['col1'] > 5) & (df['col2'] < 10)]

SQL

SELECT *
FROM table
WHERE col1 > 5 AND
col2 < 10;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Group by a column
and count

Pandas

df.groupby('col1').size()

SQL

SELECT col1, COUNT(*)


FROM table
GROUP BY col1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Group by a column
and sum another
column
Pandas

df.groupby('col1')['col2'].sum()

SQL

SELECT col1, SUM(col2)


FROM table
GROUP BY col1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Order by a column

Pandas

df.sort_values(by='col1')

SQL

SELECT *
FROM table
ORDER BY col1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Order by a column
descending

Pandas

df.sort_values(by='col1',
ascending=False)

SQL

SELECT * FROM table


ORDER BY col1 DESC;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Join two tables

Pandas

pd.merge(df1, df2, on='id')

SQL

SELECT * FROM table1


JOIN table2
ON table1.id = table2.id;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Left join two tables

Pandas

pd.merge(df1, df2, on='id', how='left')

SQL

SELECT * FROM table1


LEFT JOIN table2
ON table1.id = table2.id;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Right join two tables

Pandas

pd.merge(df1, df2, on='id', how='right')

SQL

SELECT * FROM table1


RIGHT JOIN table2
ON table1.id = table2.id;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Full outer join two


tables

Pandas

pd.merge(df1, df2, on='id', how='outer')

SQL

SELECT * FROM table1


FULL OUTER JOIN table2
ON table1.id = table2.id;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate the average


of a column

Pandas

df['col1'].mean()

SQL

SELECT AVG(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate the sum of a


column

Pandas

df['col1'].sum()

SQL

SELECT SUM(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate the
maximum of a column

Pandas

df['col1'].max()

SQL

SELECT MAX(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate the
minimum of a column

Pandas

df['col1'].min()

SQL

SELECT MIN(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Count distinct values


in a column

Pandas

df['col1'].nunique()

SQL

SELECT
COUNT(DISTINCT col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Rename a column

Pandas

df.rename(columns=
{'old_name': 'new_name'})

SQL

ALTER TABLE table


RENAME COLUMN
old_name TO new_name;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Add a new column

Pandas

df['new_col'] = value

SQL

ALTER TABLE table


ADD COLUMN new_col INT;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Drop a column

Pandas

df.drop(columns=['col1'])

SQL

ALTER TABLE table


DROP COLUMN col1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Replace null values

Pandas

df['col1'].fillna(0)

SQL

SELECT
COALESCE(col1, 0)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate the
difference between
two columns
Pandas

df['col1'] - df['col2']

SQL

SELECT col1 - col2


FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Concatenate two
columns

Pandas

df['col1'] + df['col2']

SQL

SELECT CONCAT(col1, col2)


FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Extract year from date

Pandas

df['date_col'].dt.year

SQL

SELECT YEAR(date_col)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Extract month from


date

Pandas

df['date_col'].dt.month

SQL

SELECT
MONTH(date_col) FROM
table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Extract day from date

Pandas

df['date_col'].dt.day

SQL

SELECT DAY(date_col)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Filter rows based on


string matching

Pandas

df[df['col1'].str.contains('pattern')]

SQL

SELECT * FROM table


WHERE col1 LIKE '%pattern%';

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Aggregate functions
with group by

Pandas

df.groupby('col1')['col2'].mean()

SQL

SELECT col1, AVG(col2)


FROM table
GROUP BY col1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Pivot table

Pandas

df.pivot_table(values='col1',
index='col2', columns='val1')

SQL

SELECT * FROM table


PIVOT (SUM(col1)
FOR col2 IN (val1, val2));

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Unpivot table

Pandas

df.melt(id_vars=['id'],
value_vars=['col1', 'col2'])

SQL

SELECT col1, col2


FROM table
UNPIVOT (col FOR val
IN (col1, col2));
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate cumulative
sum

Pandas

df['col2'].cumsum()

SQL

SELECT col1, SUM(col2)


OVER (ORDER BY col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate moving
average

Pandas

df['col2'].rolling(window=3).mean()

SQL

SELECT col1, AVG(col2) OVER


(ORDER BY col1 ROWS
BETWEEN 2 PRECEDING
AND CURRENT ROW)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Find the row with


maximum value in a
column
Pandas

df.loc[df['col1'].idxmax()]

SQL

SELECT * FROM table


ORDER BY col1 DESC
LIMIT 1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Find the row with


minimum value in a
column
Pandas

df.loc[df['col1'].idxmin()]

SQL

SELECT * FROM table


ORDER BY col1 ASC LIMIT 1;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Drop duplicate rows

Pandas

df.drop_duplicates()

SQL

DELETE FROM table WHERE rowid


NOT IN (SELECT MIN(rowid)
FROM table GROUP BY col1, col2);

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate the length


of a string

Pandas

df['col1'].str.len()

SQL

SELECT LENGTH(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Convert string to
uppercase

Pandas

df['col1'].str.upper()

SQL

SELECT UPPER(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Convert string to
lowercase

Pandas

df['col1'].str.lower()

SQL

SELECT LOWER(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Strip whitespace from


string

Pandas

df['col1'].str.strip()

SQL

SELECT TRIM(col1)
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Replace a substring

Pandas

df['col1'].str.replace('old',
'new')

SQL

SELECT REPLACE(col1,
'old', 'new')
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Check for null values

Pandas

df[df['col1'].isnull()]

SQL

SELECT * FROM table


WHERE col1 IS NULL;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Check for non-null


values

Pandas

df[df['col1'].notnull()]

SQL

SELECT * FROM table


WHERE col1 IS NOT
NULL;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Conditional column
based on other
columns
Pandas

df['col3'] =
df['col2'].apply(lambda x:
'High' if x > 10 else 'Low')

SQL

SELECT col1, CASE


WHEN col2 > 10 THEN
'High' ELSE 'Low' END AS
col3 FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Drop rows with null


values

Pandas

df.dropna()

SQL

DELETE FROM table


WHERE col1 IS NULL;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Create new column


from existing columns

Pandas

df['col3'] = df['col1'] +
df['col2']

SQL

SELECT col1, col2, (col1


+ col2) AS col3
FROM table;

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Calculate percentage
of a column

Pandas

df['percentage'] =
df['col1'] /
df['col1'].sum() * 100

SQL

SELECT col1, (col1 /


SUM(col1) OVER()) * 100
AS percentage
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Insert a new row

Pandas

df.loc[len(df)] = [val1, val2]

SQL

INSERT INTO table (col1, col2)


VALUES (val1, val2);

Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas

Update a row

Pandas

df.loc[df['col2'] == val2, 'col1'] = val1

SQL

UPDATE table
SET col1 = val1 WHERE
col2 = val2;

Shwetank Singh
GritSetGrow - GSGLearn.com

You might also like