0% found this document useful (0 votes)
1 views2 pages

Python Cheatsheet For Data Scientists

This document is a Python cheatsheet for data scientists, covering core Python syntax, NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn basics. It includes examples of data manipulation, visualization techniques, and machine learning model training. Additionally, it lists essential libraries for data handling, visualization, machine learning, and deep learning.

Uploaded by

sundarksp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views2 pages

Python Cheatsheet For Data Scientists

This document is a Python cheatsheet for data scientists, covering core Python syntax, NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn basics. It includes examples of data manipulation, visualization techniques, and machine learning model training. Additionally, it lists essential libraries for data handling, visualization, machine learning, and deep learning.

Uploaded by

sundarksp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Python Cheatsheet for Data Scientists

Core Python for Data Science

x = 10 # int
y = 3.14 # float
name = "AI" # str
flag = True # bool

lst = [1, 2, 3]
tpl = (1, 2, 3)
dct = {"a": 1, "b": 2}
st = {1, 2, 3}

squares = [x**2 for x in range(10)]

def square(x): return x**2


f = lambda x: x**2

NumPy

import numpy as np

a = np.array([1, 2, 3])
b = np.zeros((2, 3))
c = np.ones(5)
d = np.eye(3)
e = np.linspace(0, 1, 5)

a.mean(), a.std(), a.sum()


a.reshape(3, 1)
np.dot(a, a)

Pandas

import pandas as pd

df = pd.read_csv("data.csv")
df.head(), df.info(), df.describe()
df["col"], df[["col1", "col2"]]
df[df["col"] > 5]
df.groupby("group_col").mean()
df.isnull().sum()
df.fillna(0), df.dropna()

Matplotlib & Seaborn


Python Cheatsheet for Data Scientists

import matplotlib.pyplot as plt


import seaborn as sns

plt.plot([1,2,3], [4,5,6])
plt.hist([1,2,2,3])
plt.show()

sns.boxplot(x="col", data=df)
sns.heatmap(df.corr(), annot=True)

Scikit-learn (ML Basics)

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

X = df[["feature1", "feature2"]]
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


model = LinearRegression()
model.fit(X_train, y_train)

preds = model.predict(X_test)
mse = mean_squared_error(y_test, preds)

Common Data Science Tasks

pd.get_dummies(df["category"])

from sklearn.preprocessing import StandardScaler


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

from sklearn.ensemble import RandomForestClassifier


rf = RandomForestClassifier().fit(X, y)
importances = rf.feature_importances_

Bonus: Libraries to Know

- numpy, pandas: Data handling


- matplotlib, seaborn, plotly: Visualization
- scikit-learn: Machine learning
- xgboost, lightgbm: Gradient boosting
- statsmodels: Statistical modeling
- tensorflow, pytorch: Deep learning

You might also like