0% found this document useful (0 votes)
12 views

Week 7 - Data Visualization

Uploaded by

hadiyahaya87
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Week 7 - Data Visualization

Uploaded by

hadiyahaya87
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Week 7: Data Visualization with Real-World Data

Day 1: Visualizing Time-Series Data

Conceptual Overview

Time-series data consists of data points collected or recorded at specific time intervals. It’s
crucial in many domains like finance, healthcare, sales, and environmental studies. Visualization
of time-series data allows us to observe trends, patterns, seasonality, and outliers over time.
Common visualizations include:

Line Charts: Best for showing trends over time.

Scatter Plots: Highlight individual data points to examine variability or distribution.

Heatmaps: Visualize patterns or concentrations over time intervals like hours, days, or weeks.

---

Practical Example 1: Line Chart

Code Breakdown:

import pandas as pd
import matplotlib.pyplot as plt

# Generate time-series data


data = {
'Date': pd.date_range(start='2024-01-01', periods=10, freq='D'),
'Sales': [200, 250, 270, 300, 400, 410, 380, 350, 370, 420]
}
df = pd.DataFrame(data)

1. Data Creation: We create a dictionary with Date (a time-series index) and corresponding
Sales values. The pd.date_range function generates dates from January 1, 2024, for 10
consecutive days.

2. DataFrame Creation: This structured data is converted into a DataFrame using


pd.DataFrame.
# Plotting a line chart
plt.figure(figsize=(10, 6))
plt.plot(df['Date'], df['Sales'], marker='o', linestyle='-', color='blue')
plt.title('Daily Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

Line Chart: plt.plot creates a line connecting Sales data points over the Date range. The
marker='o' adds circles at each data point for better visibility.

Customization: Labels, grid lines, and rotation for x-axis dates enhance readability.

Output: A line chart depicting a clear upward trend in sales over time, followed by fluctuations.
Observing this, businesses can identify peak sales periods.

---

Practical Example 2: Scatter Plot

Code Explanation:

# Scatter plot for sales over time


plt.figure(figsize=(10, 6))
plt.scatter(df['Date'], df['Sales'], color='green')
plt.title('Scatter Plot of Daily Sales')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

Scatter Plot: Each sales data point is plotted individually. Unlike a line chart, it doesn’t connect
points, making it suitable for identifying data density or outliers.

Output: The scatter plot shows isolated points for daily sales. While trends are less apparent
than in line charts, variability and clustering patterns are easier to spot.
---

Practical Example 3: Heatmap (Using Seaborn)

Code Explanation:

import seaborn as sns


import numpy as np

# Generate weekly sales data


sales_data = np.random.randint(200, 500, size=(7, 4)) # 7 days, 4 weeks
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
weeks = ['Week 1', 'Week 2', 'Week 3', 'Week 4']

1. Data Generation: np.random.randint creates a 7x4 matrix, simulating sales data for 7 days
across 4 weeks.

2. Labels: Days and weeks serve as axis labels.

# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(sales_data, annot=True, fmt="d", cmap='coolwarm', xticklabels=weeks,
yticklabels=days)
plt.title('Weekly Sales Heatmap')
plt.xlabel('Weeks')
plt.ylabel('Days')
plt.show()

Heatmap: A heatmap is a grid-based visualization where colors represent data intensity.


annot=True displays actual values, and cmap='coolwarm' sets the color gradient.

Output: The heatmap highlights sales variations across days and weeks. Darker shades may
indicate low sales, while lighter shades show peaks.

---

Key Takeaways for Day 1


Use line charts for trends, scatter plots for data variability, and heatmaps for intensity patterns
over intervals.

Interpreting these visuals helps businesses strategize based on time-driven insights.

---

Day 2: Visualizing Geographical Data

Conceptual Overview

Geographical data visualizations represent spatial distributions, patterns, or trends across


regions. They’re invaluable in fields like marketing, public health, logistics, and environmental
science. Common techniques include:

Scatter Maps: Show point-based locations on a map.

Choropleth Maps: Use color gradients to represent values (e.g., population) over regions.

Geographical Heatmaps: Highlight data density or intensity geographically.

---

Practical Example 1: Scatter Map with Folium

Code Breakdown:

import folium

# Base map
map = folium.Map(location=[6.5244, 3.3792], zoom_start=10) # Lagos, Nigeria

# Add markers
locations = [[6.5244, 3.3792], [6.4654, 3.4067], [6.4500, 3.4000]]
names = ['Mainland', 'Island', 'Ikoyi']
for loc, name in zip(locations, names):
folium.Marker(location=loc, popup=name).add_to(map)

# Save map
map.save("lagos_map.html")
1. Base Map: A Folium map centered on Lagos with zoom_start=10.

2. Markers: Latitude-longitude pairs represent locations, and popup=name shows labels on


click.

Output: An interactive map with markers for Lagos locations. Users can zoom and view
region-specific data points.

---

Practical Example 2: Choropleth Map with Plotly

Code Breakdown:

import plotly.express as px

# Sample data
data = {'State': ['Lagos', 'Kano', 'Abuja', 'Rivers'],
'Population': [9000000, 3000000, 2000000, 1500000]}
df = pd.DataFrame(data)

# Creating a choropleth map


fig = px.choropleth(
df,
locations="State",
locationmode="geojson-id",
color="Population",
title="Population Distribution in Nigeria",
color_continuous_scale="Viridis"
)
fig.show()

Choropleth: locations maps states, and color visualizes population using a gradient (Viridis).

Interactivity: Plotly charts allow zooming and hovering for insights.

Output: A Nigeria map where Lagos has the darkest shade, representing the highest population.
The chart is dynamic, adding exploration flexibility.
---

Practical Example 3: Geographical Heatmap with Folium

Code Breakdown:

from folium.plugins import HeatMap

# Add Heatmap data


heat_data = [[6.5244, 3.3792, 500], [6.4654, 3.4067, 300], [6.4500, 3.4000, 400]]

# Base map
map = folium.Map(location=[6.5244, 3.3792], zoom_start=10)

# Add heatmap
HeatMap(heat_data).add_to(map)

# Save map
map.save("heatmap_lagos.html")

HeatMap Plugin: Adds intensity-based heat visualization using coordinates and values (e.g.,
500 for high density).

Output: A Lagos heatmap showing areas with high data density (bright spots) and lower density
(darker regions).

---

Key Takeaways for Day 2

Scatter Maps: Great for point-based geographical data.

Choropleths: Effective for value distributions across regions.

Geographical Heatmaps: Highlight areas of intensity for actionable insights.

---

Day 3: Visualizing Categorical Data


Conceptual Overview

Categorical data consists of labels or categories and is typically non-numeric. Examples include
product types, regions, and customer segments. Visualizing categorical data allows us to:

Compare categories effectively.

Understand distributions or proportions.

Highlight differences or similarities between groups.

Common visualizations include:

1. Bar Charts: Compare categories quantitatively.

2. Pie Charts: Show proportions or percentages of a whole.

3. Stacked Bar Charts: Compare parts of a whole across multiple categories.

---

Practical Example 1: Bar Chart

Code Breakdown:

# Sample Data
categories = ['Electronics', 'Furniture', 'Clothing', 'Groceries']
sales = [12000, 8000, 15000, 10000]

# Create a bar chart


plt.figure(figsize=(10, 6))
plt.bar(categories, sales, color='skyblue')
plt.title('Sales by Category')
plt.xlabel('Category')
plt.ylabel('Sales ($)')
plt.show()
1. Data: A simple comparison of sales across four product categories.

2. Bar Chart: plt.bar creates vertical bars representing sales values.

Output: The bar chart clearly shows that Clothing has the highest sales, while Furniture lags
behind.

---

Practical Example 2: Pie Chart

Code Explanation:

# Create a pie chart


plt.figure(figsize=(8, 8))
plt.pie(sales, labels=categories, autopct='%1.1f%%', startangle=140, colors=['gold', 'lightblue',
'pink', 'lightgreen'])
plt.title('Sales Distribution by Category')
plt.show()

Pie Chart: Proportions of sales for each category are displayed as slices.

Customization: autopct='%1.1f%%' annotates percentages, and startangle=140 rotates the chart


for better alignment.

Output: The pie chart reveals Clothing accounts for the largest sales proportion, while Furniture
occupies the smallest slice.

---

Practical Example 3: Stacked Bar Chart

Code Breakdown:

# Sample Data
regions = ['North', 'South', 'East', 'West']
electronics = [3000, 4000, 2000, 3000]
furniture = [1000, 2000, 3000, 2000]
# Create a stacked bar chart
plt.figure(figsize=(10, 6))
plt.bar(regions, electronics, label='Electronics', color='blue')
plt.bar(regions, furniture, bottom=electronics, label='Furniture', color='orange')
plt.title('Sales by Region and Category')
plt.xlabel('Region')
plt.ylabel('Sales ($)')
plt.legend()
plt.show()

1. Data: Sales are split by region and category.

2. Stacking: The bottom parameter ensures the second category (Furniture) is stacked atop the
first (Electronics).

Output: The stacked bar chart highlights total sales by region while showing how much each
category contributes.

---

Key Takeaways for Day 3

Bar Charts: Best for comparisons.

Pie Charts: Effective for showing proportions.

Stacked Bar Charts: Visualize contributions to a whole across multiple categories.

---

Day 4: Advanced Customization with Matplotlib

Conceptual Overview

Customization makes visualizations more informative and visually appealing. Advanced


customization involves:
1. Adding annotations to highlight data points or trends.

2. Using custom colors and styles to align with themes or brand aesthetics.

3. Creating dual-axis charts to compare multiple metrics simultaneously.

4. Saving plots in high resolution for professional presentations.

---

Practical Example 1: Adding Annotations

Code Breakdown:

# Sample Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
revenue = [15000, 18000, 20000, 22000, 24000]

# Line Chart with Annotations


plt.figure(figsize=(10, 6))
plt.plot(months, revenue, marker='o', linestyle='-', color='purple')

# Add annotations
for i, value in enumerate(revenue):
plt.text(months[i], value + 500, f"${value}", ha='center')

plt.title('Monthly Revenue')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.grid(True)
plt.show()

1. Annotations: plt.text places text at specific coordinates. Here, each data point is annotated
with its revenue value.

2. Customization: Markers, labels, and a grid enhance readability.


Output: The chart highlights monthly revenue trends, with values annotated above each point
for quick reference.

---

Practical Example 2: Dual-Axis Chart

Code Breakdown:

# Sample Data
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May']
revenue = [15000, 18000, 20000, 22000, 24000]
expenses = [12000, 14000, 16000, 17000, 19000]

# Create dual-axis chart


fig, ax1 = plt.subplots(figsize=(10, 6))

# First y-axis (Revenue)


ax1.plot(months, revenue, color='green', marker='o', label='Revenue')
ax1.set_ylabel('Revenue ($)', color='green')
ax1.tick_params(axis='y', labelcolor='green')

# Second y-axis (Expenses)


ax2 = ax1.twinx()
ax2.plot(months, expenses, color='red', marker='s', label='Expenses')
ax2.set_ylabel('Expenses ($)', color='red')
ax2.tick_params(axis='y', labelcolor='red')

plt.title('Monthly Revenue vs. Expenses')


fig.tight_layout()
plt.show()

1. Dual Axes: ax1 and `ax2

twinx(): Creates a secondary y-axis on the same plot for a second dataset (expenses).

Custom Colors: Each axis has its own color theme for clarity.

Alignment: fig.tight_layout() ensures labels and titles don't overlap.


Output: The dual-axis chart visually compares revenue (green line) and expenses (red line)
across months, making it easier to identify trends or discrepancies.

---

Practical Example 3: Customizing Colors and Themes

Customization can align charts with branding or specific aesthetics.

Code Breakdown:

# Sample Data
products = ['Laptop', 'Tablet', 'Smartphone', 'Desktop']
units_sold = [500, 300, 800, 200]

# Customizing Colors and Themes


plt.figure(figsize=(10, 6))
plt.barh(products, units_sold, color=['#FF5733', '#33FF57', '#3357FF', '#FFC300'])
plt.title('Units Sold by Product', fontsize=16, fontweight='bold', color='darkblue')
plt.xlabel('Units Sold', fontsize=14, color='darkgreen')
plt.ylabel('Products', fontsize=14, color='darkgreen')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.show()

Explanation:

1. Horizontal Bars: barh creates horizontal bars, ideal for long category names.

2. Custom Colors: Hex color codes provide unique, visually appealing hues.

3. Styling: Titles, labels, and grids are customized for better aesthetics and readability.

Output: The plot has a professional and vibrant appearance, with horizontal bars making
product comparisons easy.

---
Practical Example 4: Saving High-Resolution Images

Saving visualizations in high resolution is critical for professional presentations or publications.

Code Breakdown:

# Save a plot
plt.figure(figsize=(8, 5))
plt.plot(months, revenue, color='blue', marker='o', label='Revenue')
plt.title('Monthly Revenue')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.grid(True)
plt.legend()
plt.savefig('monthly_revenue.png', dpi=300, bbox_inches='tight')

Key Parameters:

1. dpi: Controls resolution (dots per inch). A value of 300 is ideal for print.

2. bbox_inches='tight': Ensures the saved image doesn’t have extra white space.

Output: The plot is saved as a high-resolution PNG file named monthly_revenue.png.

---

Key Takeaways for Day 4

1. Annotations: Add context to visualizations by labeling data points or highlighting trends.

2. Dual Axes: Compare multiple metrics on the same chart without clutter.

3. Custom Themes: Use colors, fonts, and layouts to align with branding or specific goals.

4. Saving Plots: Export high-quality images for professional use.


---

Summary

These two days cover critical tools and techniques for visualizing data effectively. And will help
to develop the ability to present data in compelling ways, catering to diverse audiences and
purposes.

You might also like