Visualization
Visualization
o
Structur
e
Without properly
visualization, you can never
tell what data is about.
Topic 1: Visualization
How do we visualize
To show relationships or
correlations between two or more
variables.
• Scatter Plot: Displays values
for two continuous variables
using points on a 2D space to
see correlations or patterns.
• Bubble Plot: An extension of
the scatter plot that also
shows a third variable through
the size of the bubbles.
• Heatmap: Uses a color
gradient to represent the
relationship and intensity
between two dimensions or
categories.
Topic 1: Visualization
Relationship Plots
To show relationships or
correlations between two or more
variables.
• Scatter Plot: Displays values
for two continuous variables
using points on a 2D space to
see correlations or patterns.
• Bubble Plot: An extension of
the scatter plot that also
shows a third variable through
the size of the bubbles.
• Heatmap: Uses a color
gradient to represent the
relationship and intensity
between two dimensions or
categories.
Topic 1: Visualization
Composition Plots
To represent a part-to-whole
relationship in data.
• Donut Chart: A variation of a
pie chart with a central cut-
out.
• Treemap: Uses nested
rectangles to display data
hierarchies.
Topic 1: Visualization
Visualization tool
Seaborn Matplotlib
Level of Abstraction High-level Low-level
More complex, requires
Ease of Use Easier to learn and use
more code
Built-in themes and More customization
Default Aesthetics color palettes required
Matplotlib Seaborn
Topic 1: Visualization
Visualization tool
import matplotlib.pyplot as plt import seaborn as sns
import numpy as np import pandas as pd
import numpy as np
# Sample data
You can choose
x = np.linspace(0, 10, 100) between plt and sns.set_palette('Set2')
y = np.sin(x) seaborn
# Sample data in DataFrame format
# Create the plot df = pd.DataFrame({'x': np.linspace(0, 10, 100), 'y': np.sin(x
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave') # Create the plot
sns.lineplot(x='x', y='y', data=df, label='Sine Wave')
# Find peaks
peaks = np.where(np.diff(np.sign(np.diff(y))) < 0)[0] + 1 # Find peaks
peaks = np.where(np.diff(np.sign(np.diff(df['y']))) < 0)[0] +
# Plot points at peaks
plt.plot(x[peaks], y[peaks], 'ro') # Plot points at peaks
sns.scatterplot(x=df['x'][peaks], y=df['y'][peaks], color='red
# Add labels, title, and annotation
plt.xlabel('x-axis') # Add labels, title, and annotation
plt.ylabel('y-axis') plt.xlabel('x-axis')
plt.title('Simple Sine Wave Plot') plt.ylabel('y-axis')
plt.text(5, 0.5, 'Peak Value', fontsize=12) plt.title('Simple Sine Wave Plot')
plt.text(5, 0.5, 'Peak Value', fontsize=12)
# Add source reference outside the plot
plt.figtext(0.05, 0.05, 'Source: Generated Data', fontsize=10) # Add source reference outside the plot
plt.figtext(0.05, 0.02, 'Source: Generated Data', fontsize=10)
plt.grid(True)
plt.legend() plt.show()
plt.show()
Topic 1: Visualization
Visualization tool
import matplotlib.pyplot as plt import seaborn as sns
import numpy as np import pandas as pd
import numpy as np
# Sample data
Prepare data
x = np.linspace(0, 10, 100) sns.set_palette('Set2')
y = np.sin(x)
# Sample data in DataFrame format
# Create the plot df = pd.DataFrame({'x': np.linspace(0, 10, 100), 'y': np.sin(x
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave') # Create the plot
sns.lineplot(x='x', y='y', data=df, label='Sine Wave')
# Find peaks
peaks = np.where(np.diff(np.sign(np.diff(y))) < 0)[0] + 1 # Find peaks
peaks = np.where(np.diff(np.sign(np.diff(df['y']))) < 0)[0] +
# Plot points at peaks
plt.plot(x[peaks], y[peaks], 'ro') # Plot points at peaks
sns.scatterplot(x=df['x'][peaks], y=df['y'][peaks], color='red
# Add labels, title, and annotation
plt.xlabel('x-axis') # Add labels, title, and annotation
plt.ylabel('y-axis') plt.xlabel('x-axis')
plt.title('Simple Sine Wave Plot') plt.ylabel('y-axis')
plt.text(5, 0.5, 'Peak Value', fontsize=12) plt.title('Simple Sine Wave Plot')
plt.text(5, 0.5, 'Peak Value', fontsize=12)
# Add source reference outside the plot
plt.figtext(0.05, 0.05, 'Source: Generated Data', fontsize=10) # Add source reference outside the plot
plt.figtext(0.05, 0.02, 'Source: Generated Data', fontsize=10)
plt.grid(True)
plt.legend() plt.show()
plt.show()
Topic 1: Visualization
Visualization tool
import matplotlib.pyplot as plt import seaborn as sns
import numpy as np import pandas as pd
import numpy as np
# Sample data
Create a plot
x = np.linspace(0, 10, 100) sns.set_palette('Set2')
y = np.sin(x)
# Sample data in DataFrame format
# Create the plot df = pd.DataFrame({'x': np.linspace(0, 10, 100), 'y': np.sin(x
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave') # Create the plot
sns.lineplot(x='x', y='y', data=df, label='Sine Wave')
# Find peaks
peaks = np.where(np.diff(np.sign(np.diff(y))) < 0)[0] + 1 # Find peaks
peaks = np.where(np.diff(np.sign(np.diff(df['y']))) < 0)[0] +
# Plot points at peaks
plt.plot(x[peaks], y[peaks], 'ro') # Plot points at peaks
sns.scatterplot(x=df['x'][peaks], y=df['y'][peaks], color='red
# Add labels, title, and annotation
plt.xlabel('x-axis') # Add labels, title, and annotation
plt.ylabel('y-axis') plt.xlabel('x-axis')
plt.title('Simple Sine Wave Plot') plt.ylabel('y-axis')
plt.text(5, 0.5, 'Peak Value', fontsize=12) plt.title('Simple Sine Wave Plot')
plt.text(5, 0.5, 'Peak Value', fontsize=12)
# Add source reference outside the plot
plt.figtext(0.05, 0.05, 'Source: Generated Data', fontsize=10) # Add source reference outside the plot
plt.figtext(0.05, 0.02, 'Source: Generated Data', fontsize=10)
plt.grid(True)
plt.legend() plt.show()
plt.show()
Topic 1: Visualization
Seaborn have
Visualization tool high-level plot
import matplotlib.pyplot as plt import seaborn as sns function, easier to
import numpy as np import pandas as pd
import numpy as np use
# Sample data
In matplotlib, you
x = np.linspace(0, 10, 100) have to customize sns.set_palette('Set2')
y = np.sin(x) the visualization
# Sample data in DataFrame format
# Create the plot type df = pd.DataFrame({'x': np.linspace(0, 10, 100), 'y': np.sin(x
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Wave') # Create the plot
sns.lineplot(x='x', y='y', data=df, label='Sine Wave')
# Find peaks
peaks = np.where(np.diff(np.sign(np.diff(y))) < 0)[0] + 1 # Find peaks
peaks = np.where(np.diff(np.sign(np.diff(df['y']))) < 0)[0] +
# Plot points at peaks
plt.plot(x[peaks], y[peaks], 'ro') # Plot points at peaks
sns.scatterplot(x=df['x'][peaks], y=df['y'][peaks], color='red
# Add labels, title, and annotation
plt.xlabel('x-axis') # Add labels, title, and annotation
plt.ylabel('y-axis') plt.xlabel('x-axis')
plt.title('Simple Sine Wave Plot') plt.ylabel('y-axis')
plt.text(5, 0.5, 'Peak Value', fontsize=12) plt.title('Simple Sine Wave Plot')
plt.text(5, 0.5, 'Peak Value', fontsize=12)
# Add source reference outside the plot
plt.figtext(0.05, 0.05, 'Source: Generated Data', fontsize=10) # Add source reference outside the plot
plt.figtext(0.05, 0.02, 'Source: Generated Data', fontsize=10)
plt.grid(True)
plt.legend() plt.show()
plt.show()
Topic 1: Visualization
Visualization tool
import matplotlib.pyplot as plt import seaborn as sns
import numpy as np import pandas as pd
import numpy as np
# Sample data
x = np.linspace(0, 10, 100) sns.set_palette('Set2')
y = np.sin(x)
# Sample data in DataFrame format
# Create the plot Add label for df = pd.DataFrame({'x': np.linspace(0, 10, 100), 'y': np.sin(x
plt.figure(figsize=(8, 6)) axis
plt.plot(x, y, label='Sine Wave') # Create the plot
sns.lineplot(x='x', y='y', data=df, label='Sine Wave')
# Find peaks
peaks = np.where(np.diff(np.sign(np.diff(y))) < 0)[0] + 1 # Find peaks
peaks = np.where(np.diff(np.sign(np.diff(df['y']))) < 0)[0] +
# Plot points at peaks
plt.plot(x[peaks], y[peaks], 'ro') # Plot points at peaks
sns.scatterplot(x=df['x'][peaks], y=df['y'][peaks], color='red
# Add labels, title, and annotation
plt.xlabel('x-axis') # Add labels, title, and annotation
plt.ylabel('y-axis') plt.xlabel('x-axis')
plt.title('Simple Sine Wave Plot') plt.ylabel('y-axis')
plt.text(5, 0.5, 'Peak Value', fontsize=12) plt.title('Simple Sine Wave Plot')
plt.text(5, 0.5, 'Peak Value', fontsize=12)
# Add source reference outside the plot
plt.figtext(0.05, 0.05, 'Source: Generated Data', fontsize=10) # Add source reference outside the plot
plt.figtext(0.05, 0.02, 'Source: Generated Data', fontsize=10)
plt.grid(True)
plt.legend() plt.show()
plt.show()
Topic 1: Visualization
Visualization tool
import matplotlib.pyplot as plt import seaborn as sns
import numpy as np import pandas as pd
import numpy as np
# Sample data
x = np.linspace(0, 10, 100) sns.set_palette('Set2')
y = np.sin(x)
# Sample data in DataFrame format
# Create the plot Seaborn also df = pd.DataFrame({'x': np.linspace(0, 10, 100), 'y': np.sin(x
plt.figure(figsize=(8, 6)) supports
plt.plot(x, y, label='Sine Wave') # Create the plot
matplotlib sns.lineplot(x='x', y='y', data=df, label='Sine Wave')
# Find peaks function
peaks = np.where(np.diff(np.sign(np.diff(y))) < 0)[0] + 1 # Find peaks
peaks = np.where(np.diff(np.sign(np.diff(df['y']))) < 0)[0] +
# Plot points at peaks
plt.plot(x[peaks], y[peaks], 'ro') # Plot points at peaks
sns.scatterplot(x=df['x'][peaks], y=df['y'][peaks], color='red
# Add labels, title, and annotation
plt.xlabel('x-axis') # Add labels, title, and annotation
plt.ylabel('y-axis') plt.xlabel('x-axis')
plt.title('Simple Sine Wave Plot') plt.ylabel('y-axis')
plt.text(5, 0.5, 'Peak Value', fontsize=12) plt.title('Simple Sine Wave Plot')
plt.text(5, 0.5, 'Peak Value', fontsize=12)
# Add source reference outside the plot
plt.figtext(0.05, 0.05, 'Source: Generated Data', fontsize=10) # Add source reference outside the plot
plt.figtext(0.05, 0.02, 'Source: Generated Data', fontsize=10)
plt.grid(True)
plt.legend() plt.show()
plt.show()
Topic 1: Visualization
Structure of a plot
This could be
misleading.
A vast majority of
US area is empty.
Topic 1: Visualization
Case study: US president
election