Data Analysis With Python
Data Analysis With Python
• Datasets
o Binning
o Grouping data
▪ groupby
▪ pivot
▪ Heatmap
o Correlation
o Correlation - Statistics
▪ Pearson Correlation
▪ Correlation Heatmap
• Model Development
▪ Regression Plot
▪ Residual Plot
▪ Distribution Plots
o Polynomial Regression and Pipelines
▪ R-squared
o Function cross_val_score()
o Function cross_val_predict()
o Ridge Regression
o Grid Search
Datasets
Understanding Datasets
• Should check:
o data types
▪ df.dtypes
o data distribution
▪ df.describe()
▪ unique
▪ top
▪ freq
↥ back to top
• Data formatting
• Data binning
is equivalent to
df = df.dropna(subset=["price"], axis=0)
Non-formatted:
• confusing
• hard to aggregate
• hard to compare