0% found this document useful (0 votes)
24 views4 pages

Advanced Python Exercise Set

Uploaded by

nthumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views4 pages

Advanced Python Exercise Set

Uploaded by

nthumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Advanced Python Data Science Exercise

Set
1. Matplotlib: Advanced OHLC Candlestick Plot with Volume Annotations
Using the provided OHLC dataset, plot candlestick-style OHLC bars. Highlight the top 3
highest volume days with annotations. Ensure text size dynamically adjusts to prevent
overlaps.

Dataset Preparation Code:

import pandas as pd
import numpy as np

np.random.seed(42)
dates = pd.date_range("2023-01-01", periods=100)
ohlc_df = pd.DataFrame({
'Date': dates,
'Open': np.random.uniform(100, 200, 100).round(2),
'High': np.random.uniform(200, 300, 100).round(2),
'Low': np.random.uniform(50, 100, 100).round(2),
'Close': np.random.uniform(100, 200, 100).round(2),
'Volume': np.random.randint(1_000, 10_000, 100)
})

2. Matplotlib: Multi-Axis Climate Plot with Interactive Hover


Plot Temperature, Humidity, and WindSpeed on a shared x-axis with separate y-axes. Add
hover interactivity using mpl_connect or mplcursors.

Dataset Preparation Code:

climate_df = pd.DataFrame({
'Date': pd.date_range("2023-01-01", periods=100),
'Temperature': np.random.uniform(20, 40, 100),
'Humidity': np.random.uniform(40, 90, 100),
'WindSpeed': np.random.uniform(5, 30, 100)
})
3. Plotly: Drilldown Sunburst with Time-Series Update
Create a Plotly Dash app with a sunburst chart for Region > Country > Product > Quarter. On
clicking a segment, show a corresponding time series.

Dataset Preparation Code:

regions = ['Asia', 'Europe', 'America']


countries = {'Asia': ['India', 'China'], 'Europe': ['France', 'Germany'], 'America': ['USA',
'Brazil']}
products = ['A', 'B', 'C']
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
data = []
for region in regions:
for country in countries[region]:
for product in products:
for quarter in quarters:
data.append({
'Region': region,
'Country': country,
'Product': product,
'Quarter': quarter,
'Sales': np.random.randint(1000, 10000)
})
sales_df = pd.DataFrame(data)

4. Plotly: Linked Hover and Animated Subplots


Use Plotly to plot: (a) a choropleth map, (b) a scatter plot, and (c) a time-series bar chart.
Animate monthly data and enable linked hover.

Dataset Preparation Code:

world_df = pd.DataFrame({
'Country': ['USA', 'India', 'Germany', 'Brazil', 'China'],
'Sales': np.random.randint(10_000, 100_000, 5),
'ISO': ['USA', 'IND', 'DEU', 'BRA', 'CHN']
})
product_df = pd.DataFrame({
'Product': ['A', 'B', 'C', 'D'],
'Price': np.random.uniform(10, 100, 4),
'Volume': np.random.randint(100, 500, 4)
})
time_series_df = pd.DataFrame({
'Month': pd.date_range("2023-01-01", periods=12, freq='M'),
'Sales': np.random.randint(5000, 15000, 12)
})

5. Pandas: Rolling Average Anomaly Detection


From a MultiIndex dataset with transaction logs, compute rolling averages and flag spend
increase anomalies.

Dataset Preparation Code:

user_ids = [f'U{i}' for i in range(1, 21)]


dates = pd.date_range('2023-01-01', '2023-04-30')
transaction_data = []
for uid in user_ids:
for date in np.random.choice(dates, 40):
transaction_data.append({
'UserID': uid,
'Date': date,
'Spend': round(np.random.uniform(10, 500), 2)
})
transaction_df = pd.DataFrame(transaction_data)
transaction_df = transaction_df.sort_values(['UserID', 'Date']).set_index(['UserID', 'Date'])

6. Pandas: Funnel Analysis from Multi-source Data


Using 3 CSVs (users, logins, purchases), compute user funnel conversion metrics based on
time windows.

Dataset Preparation Code:

user_ids = [f'U{i}' for i in range(1, 21)]


users_df = pd.DataFrame({
'user_id': user_ids,
'join_date': pd.date_range('2023-01-01', periods=20)
})
logins_df = pd.DataFrame({
'user_id': np.random.choice(user_ids, 50),
'login_date': pd.date_range('2023-01-01', periods=50)
})
purchases_df = pd.DataFrame({
'user_id': np.random.choice(user_ids, 30),
'purchase_date': pd.date_range('2023-01-10', periods=30),
'amount': np.random.randint(100, 1000, 30)
})

7. NumPy: Memory-Efficient Weighted Window Function


Apply a custom window function to a large 1D NumPy array (>10 million elements) using
broadcasting (no loops).

Dataset Preparation Code:

large_array = np.random.rand(10_000_000)
# Goal: Apply custom weighted moving average of size 5

8. NumPy: Vectorized Random Walks with Reset Constraint


Simulate 100,000 random walks of 1000 steps. Reset to zero if walk drops below -10. Track
resets and final position.

Dataset Preparation Code:

num_walks = 100_000
steps = 1000
random_walks = np.random.choice([-1, 1], size=(num_walks, steps))

You might also like