Deep Python for Data Analysis
Deep Python for Data Analysis
Python is a high-level, versatile programming language ideal for data analysis due to its readability and
ecosystem. It supports a variety of tasks including data cleaning, transformation, statistical modeling, and
visualization.
Key Features:
- ndarray: Multidimensional array object
- Broadcasting: Arithmetic operations on arrays of different shapes
- Mathematical functions: mean, std, dot, etc.
Example:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(np.mean(arr)) # Output: 2.5
print(arr.shape) # Output: (2, 2)
Key Operations:
- Reading data: pd.read_csv(), pd.read_excel()
- Inspecting data: df.head(), df.info()
- Filtering: df[df['Age'] > 25]
- Sorting: df.sort_values(by='Salary')
Example:
import pandas as pd
df = pd.DataFrame({'Name': ['A', 'B'], 'Age': [22, 28]})
print(df[df['Age'] > 25])
Python for Data Analysis - Complete Notes
Example:
df['Age'] = df['Age'].fillna(df['Age'].mean())
- Grouping: df.groupby('Department')['Salary'].mean()
- Aggregation: df.agg({'Age': ['mean', 'max'], 'Salary': 'sum'})
- Pivot Tables:
df.pivot_table(index='Dept', values='Salary', aggfunc='mean')
Example:
import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [10, 20, 30]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()
Example:
import seaborn as sns
sns.set(style='darkgrid')
tips = sns.load_dataset('tips')
sns.barplot(x='day', y='total_bill', data=tips)
plt.show()
Time series data has timestamps. Pandas supports powerful time-based indexing.
Example:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
monthly_avg = df['sales'].resample('M').mean()
NumPy Examples:
np.mean(data), np.median(data), np.std(data)
Example:
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", size="pop", color="continent")
fig.show()
Python for Data Analysis - Complete Notes
Steps:
- Load dataset
- Split data: train_test_split()
- Train model: model.fit()
- Predict: model.predict()
Example:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)