Week 3.1 (PAI)
Week 3.1 (PAI)
In Python, graphs created using the matplotlib library provide a versatile means of visualizing data,
relationships, and trends. With matplotlib's robust plotting capabilities, developers can generate
various types of graphs, including line plots, scatter plots, bar charts, histograms, and more. These
graphs serve as powerful tools for analyzing datasets, exploring patterns, and communicating
insights effectively. With matplotlib's intuitive interface and extensive customization options, users
can create visually appealing graphs tailored to their specific needs, making it a go-to choice for
data visualization tasks in Python.
Bar Graph
# Data
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]
# Plot
plt.bar(categories, values)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Simple Bar Graph')
plt.show()
# Data
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]
# Plot
plt.barh(categories, values)
plt.xlabel('Values')
plt.ylabel('Categories')
plt.title('Horizontal Bar Graph')
plt.show()
Grouped Bar Graph
This example shows a grouped bar graph where values from two different groups are displayed
side by side for each category. It's suitable for comparing values across multiple groups within the
same categories.
import numpy as np
import matplotlib.pyplot as plt
# Data
categories = ['A', 'B', 'C', 'D', 'E']
values1 = [23, 45, 56, 78, 32]
values2 = [40, 35, 60, 55, 25]
# Plot
x = np.arange(len(categories))
width = 0.35
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, values1, width, label='Group 1')
rects2 = ax.bar(x + width/2, values2, width, label='Group 2')
ax.set_xlabel('Categories')
ax.set_ylabel('Values')
ax.set_title('Grouped Bar Graph')
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend()
plt.show()
# Data
categories = ['A', 'B', 'C', 'D', 'E']
values1 = [23, 45, 56, 78, 32]
values2 = [40, 35, 60, 55, 25]
# Plot
plt.bar(categories, values1, label='Group 1')
plt.bar(categories, values2, bottom=values1, label='Group 2')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Stacked Bar Graph')
plt.legend()
plt.show()
# Data
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]
colors = ['red', 'blue', 'green', 'orange', 'purple']
# Plot
plt.bar(categories, values, color=colors)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Colored Bar Graph')
plt.show()
Line Graph
# Data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
# Plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Graph')
plt.show()
Line Graph with Mul:ple Lines
This example demonstrates a line graph with multiple lines, each representing a
different dataset. It's useful for comparing trends or patterns between different sets of
data.
# Data
x = [1, 2, 3, 4, 5]
y1 = [10, 20, 25, 30, 35]
y2 = [15, 18, 22, 27, 32]
# Plot
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Graph with Multiple Lines')
plt.legend()
plt.show()
# Data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
# Plot
plt.plot(x, y, marker='o')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Graph with Markers')
plt.show()
# Data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
# Plot
plt.plot(x, y, color='green', linestyle='--', marker='o', markersize=8)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Graph with Customization')
plt.show()
Pie Chart
# Data
labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]
# Plot
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title('Simple Pie Chart')
plt.show()
# Data
labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]
explode = (0, 0.1, 0, 0) # "explode" the 2nd slice (B)
# Plot
plt.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%')
plt.title('Exploded Pie Chart')
plt.show()
Pie Chart with Custom Colors
This example customizes the colors of the pie chart slices to make the visualization more visually
appealing or to convey additional information. It's useful for enhancing the aesthetics of the chart
or for emphasizing specific categories.
# Data
labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]
colors = ['red', 'blue', 'green', 'orange']
# Plot
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%')
plt.title('Pie Chart with Custom Colors')
plt.show()
# Data
labels_outer = ['Group 1', 'Group 2']
sizes_outer = [60, 40]
labels_inner = ['A', 'B', 'C', 'D']
sizes_inner = [20, 15, 25, 40]
# Plot
fig, ax = plt.subplots()
ax.pie(sizes_outer, labels=labels_outer, radius=1.2)
ax.pie(sizes_inner, labels=labels_inner, radius=0.7)
plt.title('Nested Pie Chart')
plt.show()
Sca1er Graph
# Data
x = [1, 2, 3, 4, 5]
y = [10, 15, 20, 25, 30]
# Plot
plt.scatter(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Scatter Plot')
plt.show()
# Data
x = [1, 2, 3, 4, 5]
y = [10, 15, 20, 25, 30]
sizes = [100, 200, 300, 400, 500]
colors = ['red', 'blue', 'green', 'orange', 'purple']
# Plot
plt.scatter(x, y, s=sizes, c=colors, alpha=0.5)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot with Different Colors and Sizes')
plt.show()
# Data
x = [1, 2, 3, 4, 5]
y = [10, 15, 20, 25, 30]
# Data
x = [1, 2, 3, 4, 5]
y = [10, 15, 20, 25, 30]
# Plot
plt.scatter(x, y)
for i, txt in enumerate(y):
plt.annotate(txt, (x[i], y[i]))
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot with Annotation')
plt.show()
Histogram
A histogram is a graphical representation of the distribution of numerical data. It consists of a
series of adjacent rectangles, or bins, where each bin represents a specific range of values. The
height of each bin corresponds to the frequency or count of data points falling within that range.
Histograms provide a visual summary of the underlying data distribution, allowing for quick
insights into the central tendency, spread, and shape of the dataset. They are commonly used in
data analysis and visualization to identify patterns, outliers, and underlying trends within the data.
What is Bin?
In histograms, bins represent intervals into which the data range is divided. Each bin corresponds
to a specific range of values, and the height of the bar within each bin represents the frequency or
count of data points falling within that range. Choosing an appropriate number of bins is crucial
as it determines the granularity of the histogram; too few bins may oversimplify the data
distribution, while too many bins may introduce noise or obscure underlying patterns. Adjusting
the bin size allows for a balance between capturing important details of the data distribution and
maintaining clarity in visualization.
Simple Histogram
This example illustrates a basic histogram where the distribution of a single dataset is represented
by bars. It's useful for visualizing the frequency or distribution of values within a continuous
variable.
# Data
data = np.random.normal(loc=0, scale=1, size=1000)
# Plot
plt.hist(data, bins=30)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Simple Histogram')
plt.show()
# Data
data = np.random.normal(loc=0, scale=1, size=1000)
# Plot
plt.hist(data, bins=[-3, -2, -1, 0, 1, 2, 3])
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram with Custom Bins')
plt.show()
Overlayed Histograms
This example overlays multiple histograms, each representing a different dataset, on the same plot.
It's useful for comparing the distributions of multiple variables or datasets within the same plot.
# Plot
plt.hist(data1, bins=30, alpha=0.5, label='Data 1')
plt.hist(data2, bins=30, alpha=0.5, label='Data 2')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Overlayed Histograms')
plt.legend()
plt.show()
# Data
data = np.random.normal(loc=0, scale=1, size=1000)
# Plot
sns.histplot(data, kde=True)
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Histogram with Density Plot')
plt.show()