Python for Data Analysis - Complete Notes
1. Introduction to Python for Data Analysis
Python is a powerful and widely-used language in data analysis due to its simplicity and rich ecosystem of
libraries. It supports data manipulation, visualization, and machine learning tasks efficiently.
2. Key Libraries for Data Analysis
- NumPy: For numerical operations
- Pandas: For data manipulation and analysis
- Matplotlib: For basic data visualization
- Seaborn: For statistical plots
- Plotly: For interactive visualizations
- Scikit-learn: For machine learning
- Statsmodels: For statistical modeling
3. NumPy Basics
NumPy provides array objects and functions to perform mathematical operations efficiently.
Example:
import numpy as np
arr = np.array([1, 2, 3])
print(arr.mean()) # Output: 2.0
4. Pandas Essentials
Pandas is used to handle tabular data using DataFrames and Series.
Example:
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
print(df.describe())
5. Data Cleaning with Pandas
- Handling missing values: df.dropna(), df.fillna()
- Filtering data: df[df['column'] > value]
Python for Data Analysis - Complete Notes
- Renaming columns: df.rename(columns={'old': 'new'})
- Changing data types: df['col'] = df['col'].astype(int)
6. Data Aggregation & Grouping
Grouping helps in aggregating data:
df.groupby('column')['sales'].sum()
df.pivot_table(index='category', values='sales', aggfunc='mean')
7. Data Visualization with Matplotlib & Seaborn
Matplotlib:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
Seaborn:
import seaborn as sns
sns.barplot(x='category', y='sales', data=df)
8. Handling Time Series Data
Pandas supports datetime operations:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df['value'].resample('M').mean()
9. Basic Statistics with Python
You can compute basic statistics with Pandas or NumPy:
df['column'].mean(), median(), std(), var(), value_counts()
10. Intro to Scikit-learn
Scikit-learn is used for ML modeling:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
Python for Data Analysis - Complete Notes
predictions = model.predict(X_test)
11. Exploratory Data Analysis (EDA)
- Understand data shape and types
- Check missing values
- Use df.describe(), df.info(), df.nunique()
- Visualize with histograms, boxplots, correlation heatmaps
12. Interview Tips
- Be confident in Pandas and NumPy
- Know how to clean and filter data
- Practice basic visualizations
- Understand simple ML concepts like linear regression
- Be ready to write logic for real-world scenarios