Week 7 - Data Visualization
Week 7 - Data Visualization
Conceptual Overview
Time-series data consists of data points collected or recorded at specific time intervals. It’s
crucial in many domains like finance, healthcare, sales, and environmental studies. Visualization
of time-series data allows us to observe trends, patterns, seasonality, and outliers over time.
Common visualizations include:
Heatmaps: Visualize patterns or concentrations over time intervals like hours, days, or weeks.
---
Code Breakdown:
import pandas as pd
import matplotlib.pyplot as plt
1. Data Creation: We create a dictionary with Date (a time-series index) and corresponding
Sales values. The pd.date_range function generates dates from January 1, 2024, for 10
consecutive days.
Line Chart: plt.plot creates a line connecting Sales data points over the Date range. The
marker='o' adds circles at each data point for better visibility.
Customization: Labels, grid lines, and rotation for x-axis dates enhance readability.
Output: A line chart depicting a clear upward trend in sales over time, followed by fluctuations.
Observing this, businesses can identify peak sales periods.
---
Code Explanation:
Scatter Plot: Each sales data point is plotted individually. Unlike a line chart, it doesn’t connect
points, making it suitable for identifying data density or outliers.
Output: The scatter plot shows isolated points for daily sales. While trends are less apparent
than in line charts, variability and clustering patterns are easier to spot.
---
Code Explanation:
1. Data Generation: np.random.randint creates a 7x4 matrix, simulating sales data for 7 days
across 4 weeks.
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(sales_data, annot=True, fmt="d", cmap='coolwarm', xticklabels=weeks,
yticklabels=days)
plt.title('Weekly Sales Heatmap')
plt.xlabel('Weeks')
plt.ylabel('Days')
plt.show()
Output: The heatmap highlights sales variations across days and weeks. Darker shades may
indicate low sales, while lighter shades show peaks.
---
---
Conceptual Overview
Choropleth Maps: Use color gradients to represent values (e.g., population) over regions.
---
Code Breakdown:
import folium
# Base map
map = folium.Map(location=[6.5244, 3.3792], zoom_start=10) # Lagos, Nigeria
# Add markers
locations = [[6.5244, 3.3792], [6.4654, 3.4067], [6.4500, 3.4000]]
names = ['Mainland', 'Island', 'Ikoyi']
for loc, name in zip(locations, names):
folium.Marker(location=loc, popup=name).add_to(map)
# Save map
map.save("lagos_map.html")
1. Base Map: A Folium map centered on Lagos with zoom_start=10.
Output: An interactive map with markers for Lagos locations. Users can zoom and view
region-specific data points.
---
Code Breakdown:
import plotly.express as px
# Sample data
data = {'State': ['Lagos', 'Kano', 'Abuja', 'Rivers'],
'Population': [9000000, 3000000, 2000000, 1500000]}
df = pd.DataFrame(data)
Choropleth: locations maps states, and color visualizes population using a gradient (Viridis).
Output: A Nigeria map where Lagos has the darkest shade, representing the highest population.
The chart is dynamic, adding exploration flexibility.
---
Code Breakdown:
# Base map
map = folium.Map(location=[6.5244, 3.3792], zoom_start=10)
# Add heatmap
HeatMap(heat_data).add_to(map)
# Save map
map.save("heatmap_lagos.html")
HeatMap Plugin: Adds intensity-based heat visualization using coordinates and values (e.g.,
500 for high density).
Output: A Lagos heatmap showing areas with high data density (bright spots) and lower density
(darker regions).
---
---
Categorical data consists of labels or categories and is typically non-numeric. Examples include
product types, regions, and customer segments. Visualizing categorical data allows us to:
---
Code Breakdown:
# Sample Data
categories = ['Electronics', 'Furniture', 'Clothing', 'Groceries']
sales = [12000, 8000, 15000, 10000]
Output: The bar chart clearly shows that Clothing has the highest sales, while Furniture lags
behind.
---
Code Explanation:
Pie Chart: Proportions of sales for each category are displayed as slices.
Output: The pie chart reveals Clothing accounts for the largest sales proportion, while Furniture
occupies the smallest slice.
---
Code Breakdown:
# Sample Data
regions = ['North', 'South', 'East', 'West']
electronics = [3000, 4000, 2000, 3000]
furniture = [1000, 2000, 3000, 2000]
# Create a stacked bar chart
plt.figure(figsize=(10, 6))
plt.bar(regions, electronics, label='Electronics', color='blue')
plt.bar(regions, furniture, bottom=electronics, label='Furniture', color='orange')
plt.title('Sales by Region and Category')
plt.xlabel('Region')
plt.ylabel('Sales ($)')
plt.legend()
plt.show()
2. Stacking: The bottom parameter ensures the second category (Furniture) is stacked atop the
first (Electronics).
Output: The stacked bar chart highlights total sales by region while showing how much each
category contributes.
---
---
Conceptual Overview
2. Using custom colors and styles to align with themes or brand aesthetics.
---
Code Breakdown:
# Sample Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
revenue = [15000, 18000, 20000, 22000, 24000]
# Add annotations
for i, value in enumerate(revenue):
plt.text(months[i], value + 500, f"${value}", ha='center')
plt.title('Monthly Revenue')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.grid(True)
plt.show()
1. Annotations: plt.text places text at specific coordinates. Here, each data point is annotated
with its revenue value.
---
Code Breakdown:
# Sample Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
revenue = [15000, 18000, 20000, 22000, 24000]
expenses = [12000, 14000, 16000, 17000, 19000]
twinx(): Creates a secondary y-axis on the same plot for a second dataset (expenses).
Custom Colors: Each axis has its own color theme for clarity.
---
Code Breakdown:
# Sample Data
products = ['Laptop', 'Tablet', 'Smartphone', 'Desktop']
units_sold = [500, 300, 800, 200]
Explanation:
1. Horizontal Bars: barh creates horizontal bars, ideal for long category names.
2. Custom Colors: Hex color codes provide unique, visually appealing hues.
3. Styling: Titles, labels, and grids are customized for better aesthetics and readability.
Output: The plot has a professional and vibrant appearance, with horizontal bars making
product comparisons easy.
---
Practical Example 4: Saving High-Resolution Images
Code Breakdown:
# Save a plot
plt.figure(figsize=(8, 5))
plt.plot(months, revenue, color='blue', marker='o', label='Revenue')
plt.title('Monthly Revenue')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.grid(True)
plt.legend()
plt.savefig('monthly_revenue.png', dpi=300, bbox_inches='tight')
Key Parameters:
1. dpi: Controls resolution (dots per inch). A value of 300 is ideal for print.
2. bbox_inches='tight': Ensures the saved image doesn’t have extra white space.
---
2. Dual Axes: Compare multiple metrics on the same chart without clutter.
3. Custom Themes: Use colors, fonts, and layouts to align with branding or specific goals.
Summary
These two days cover critical tools and techniques for visualizing data effectively. And will help
to develop the ability to present data in compelling ways, catering to diverse audiences and
purposes.