0% found this document useful (0 votes)
5 views4 pages

Deep Python for Data Analysis

This document provides comprehensive notes on using Python for data analysis, covering key libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn. It includes essential operations for data manipulation, cleaning, visualization, and machine learning, along with practical examples. The document also offers tips for mastering data analysis skills and preparing for interviews.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

Deep Python for Data Analysis

This document provides comprehensive notes on using Python for data analysis, covering key libraries such as NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn. It includes essential operations for data manipulation, cleaning, visualization, and machine learning, along with practical examples. The document also offers tips for mastering data analysis skills and preparing for interviews.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Python for Data Analysis - Complete Notes

1. Introduction to Python for Data Analysis

Python is a high-level, versatile programming language ideal for data analysis due to its readability and
ecosystem. It supports a variety of tasks including data cleaning, transformation, statistical modeling, and
visualization.

2. NumPy - Numerical Python

NumPy provides efficient array structures and mathematical functions.

Key Features:
- ndarray: Multidimensional array object
- Broadcasting: Arithmetic operations on arrays of different shapes
- Mathematical functions: mean, std, dot, etc.

Example:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(np.mean(arr)) # Output: 2.5
print(arr.shape) # Output: (2, 2)

3. Pandas - Data Manipulation and Analysis

Pandas introduces two main data structures:


- Series: 1D labeled array
- DataFrame: 2D labeled data structure

Key Operations:
- Reading data: pd.read_csv(), pd.read_excel()
- Inspecting data: df.head(), df.info()
- Filtering: df[df['Age'] > 25]
- Sorting: df.sort_values(by='Salary')

Example:
import pandas as pd
df = pd.DataFrame({'Name': ['A', 'B'], 'Age': [22, 28]})
print(df[df['Age'] > 25])
Python for Data Analysis - Complete Notes

4. Data Cleaning in Pandas

- Handling Missing Data:


df.isnull().sum()
df.dropna(), df.fillna(value)
- Renaming Columns:
df.rename(columns={'old': 'new'})
- Changing Data Types:
df['col'] = df['col'].astype('int')

Example:
df['Age'] = df['Age'].fillna(df['Age'].mean())

5. Grouping and Aggregation

- Grouping: df.groupby('Department')['Salary'].mean()
- Aggregation: df.agg({'Age': ['mean', 'max'], 'Salary': 'sum'})
- Pivot Tables:
df.pivot_table(index='Dept', values='Salary', aggfunc='mean')

6. Matplotlib - Basic Visualization

Matplotlib is used to create static, animated, and interactive plots.

Example:
import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [10, 20, 30]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()

7. Seaborn - Statistical Visualization

Seaborn is built on top of Matplotlib and is used for statistical graphics.


Python for Data Analysis - Complete Notes

Example:
import seaborn as sns
sns.set(style='darkgrid')
tips = sns.load_dataset('tips')
sns.barplot(x='day', y='total_bill', data=tips)
plt.show()

8. Time Series Analysis with Pandas

Time series data has timestamps. Pandas supports powerful time-based indexing.

Example:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
monthly_avg = df['sales'].resample('M').mean()

9. Statistics with Pandas and NumPy

- Descriptive Stats: df.describe()


- Correlation: df.corr()
- Value Counts: df['Category'].value_counts()
- Standard Deviation: df['Salary'].std()

NumPy Examples:
np.mean(data), np.median(data), np.std(data)

10. Plotly - Interactive Visualization

Plotly is a graphing library for interactive charts.

Example:
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", size="pop", color="continent")
fig.show()
Python for Data Analysis - Complete Notes

11. Scikit-learn - Machine Learning Library

Scikit-learn provides simple tools for predictive data analysis.

Steps:
- Load dataset
- Split data: train_test_split()
- Train model: model.fit()
- Predict: model.predict()

Example:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)

12. Summary & Tips for Interviews

- Master Pandas and NumPy first


- Practice real datasets (Kaggle, UCI, etc.)
- Know how to visualize and clean data
- Understand ML workflow: EDA -> Preprocessing -> Model
- Practice SQL + Python-based case studies

You might also like