Unit 5 Matplotlib
Unit 5 Matplotlib
Unit 5 Matplotlib
Unit-5
Plotting and visualization:
A brief matplotlib API primer, figures and subplot, color, markers and line styles, ticks,
labels and legends, Annotations and drawing on a subplot, saving plot to file, plotting functions
in pandas, line plot, bar plots, histograms and density plots, and scatter plots.
---------------------------------------------------------------------------------------------------------------------
DATA VISUALIZATION:
Data visualization is the graphical representation of data. It involves transforming data into
visual elements like charts, graphs, and maps to make it easier to understand, analyze, and
communicate.
By converting raw data into visual formats, data visualization helps people identify patterns,
trends, and relationships that might be difficult to discern from numerical data alone. It's a
powerful tool for:
Data visualization plays a key role in data science and analysis. It enables us to grasp datasets
by representing them. Matplotlib, a known Python library offers a range of tools, for
generating informative and visually appealing plots and charts. One outstanding feature of
1
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
Matplotlib is its user-versatile interface called Pyplot API, which simplifies the process of
creating plots.
Define Matplotlib.
Matplotlib is a python library it used to visualizations in python. It’s used for creating
static, animated, and interaction visualization in python.
Which library is used for Matplotlib?
o Numpy library and is a core part of the spicy short –a group of scientific
computing tools for python.
o A panda is a library used by matplotlib mainly for data manipulation and
analysis.
Simple example coding for matplotlib:
Import matplotlib.pyplot as plt
Figures and Subplots in Matplotlib
Figures in Matplotlib are the top-level containers for all plot elements, such as axes, lines, and
text. A figure can contain multiple subplots.
Subplots are individual plotting areas within a figure. They are typically arranged in a grid-like
structure.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
plt.show()
2
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
Colors
Basic Colors: You can specify colors by name (e.g., 'red', 'blue', 'green').
Hex Codes: You can use hex codes for colors (e.g., '#FF5733' for a specific shade of
orange).
Shorthand Notations: Some colors have shorthand notations (e.g., 'r' for red, 'g' for
green, 'b' for blue).
Example:
plt.plot(x, y, color='red') # Line will be red
plt.plot(x, y, color='#FF5733') # Line will be a specific orange shade
Markers
Line Styles
3
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
Shorthand Notation: You can use a single string to combine them (e.g., 'go--' for a green
dashed line with circle markers):
plt.plot(x, y, 'go--') # Equivalent to color='green', marker='o', linestyle='--'
Ticks
Purpose: Ticks are the markers along the axes that indicate data values.
Customizing Ticks:
o plt.xticks() and plt.yticks(): Set custom tick positions and labels on the x-axis and
y-axis.
o Example:
plt.xticks([0, 1, 2, 3], ['A', 'B', 'C', 'D']) # Custom labels on the x-axis
plt.yticks([10, 20, 30, 40], ['Low', 'Medium', 'High', 'Very High']) # Custom
labels on the y-axis
Labels
Font Size and Style: You can customize the font size and style of the labels.
plt.xlabel('Time (hours)', fontsize=14, fontweight='bold')
plt.ylabel('Temperature (°C)', fontsize=14, fontstyle='italic')
Legends
Purpose: Legends describe the different data series or categories in the plot.
Adding a Legend:
o plt.legend(): Adds a legend to the plot. The legend labels are usually taken from
the label argument in the plot() function.
o Example:
plt.plot(x1, y1, label='Dataset 1')
4
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
Annotations
Purpose: Annotations are used to add text labels to specific points in the plot, often to
provide additional information or highlight important data points.
Adding Annotations:
o plt.annotate(): Adds text to a specific point on the plot.
o Basic Usage:
plt.annotate('Important Point', xy=(x_value, y_value), xytext=(x_offset, y_offset),
arrowprops=dict(facecolor='black', arrowstyle='->'))
Purpose: Drawing shapes like lines, rectangles, circles, or polygons on a subplot can help
to highlight specific areas or patterns in the data.
Common Shapes:
o Line: Use plt.axhline(), plt.axvline(), or plt.plot() to draw horizontal, vertical, or
diagonal lines.
5
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
fig, ax = plt.subplots()
ax.plot(x, y)
rect = patches.Rectangle((2, 4), 2, 3, linewidth=2, edgecolor='r', facecolor='none')
ax.add_patch(rect) # Adds a rectangle starting at (2,4) with width 2 and height 3
You can combine annotations with shapes to provide context or further highlight specific
areas:
fig, ax = plt.subplots()
ax.plot(x, y)
ax.add_patch(patches.Rectangle((2, 4), 2, 3, linewidth=2, edgecolor='r',
facecolor='none'))
ax.annotate('Highlighted Area', xy=(3, 5.5), xytext=(4,
7),arrowprops=dict(facecolor='black', arrowstyle='->'))
Adding Annotations and Shapes: You can add annotations and shapes to individual
subplots by referencing the specific Axes object.
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(x1, y1)
6
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
ax2.plot(x2, y2)
ax1.annotate('Point A', xy=(2, 5), xytext=(3, 7), arrowprops=dict(facecolor='black',
arrowstyle='->'))
ax2.add_patch(patches.Circle((4, 5), radius=0.5, color='green'))
Plotting function in pandas
A line plot is the default plot type when using .plot() without specifying kind.
Example:
import matplotlib.pyplot as plt
import pandas as pd
df.plot(x='Year', y='Sales')
plt.show()
2. Bar Plot
Creates a bar chart, which is useful for comparing quantities across different categories.
Example:
df.plot(kind='bar', x='Year', y='Sales', color='skyblue')
plt.show()
3. Histogram
4. Scatter Plot
A scatter plot displays individual data points to show relationships between two
variables.
Example:
df.plot(kind='scatter', x='Year', y='Sales', color='red')
plt.show()
7
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE
5. Box Plot
A box plot (or box-and-whisker plot) shows the distribution of data based on a five-
number summary.
Example:
df.plot(kind='box')
plt.show()
6.Pie Chart
A pie chart shows proportions of a whole. Typically used with a Series or one column of
a DataFrame.
Example:
df.set_index('Year')['Sales'].plot(kind='pie', autopct='%1.1f%%')
plt.ylabel('') # Remove the default y-label
plt.show()