Unit 5 Matplotlib

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

K.M.G.

COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)


DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

Unit-5
Plotting and visualization:
A brief matplotlib API primer, figures and subplot, color, markers and line styles, ticks,
labels and legends, Annotations and drawing on a subplot, saving plot to file, plotting functions
in pandas, line plot, bar plots, histograms and density plots, and scatter plots.
---------------------------------------------------------------------------------------------------------------------
DATA VISUALIZATION:

Data visualization is the graphical representation of data. It involves transforming data into
visual elements like charts, graphs, and maps to make it easier to understand, analyze, and
communicate.

By converting raw data into visual formats, data visualization helps people identify patterns,
trends, and relationships that might be difficult to discern from numerical data alone. It's a
powerful tool for:

 Understanding complex data: Visualizations can simplify complex datasets, making


them more accessible to a wider audience.
 Identifying trends and patterns: Visual representations can quickly highlight trends,
outliers, and correlations.
 Communicating findings effectively: Visualizations can convey information more
effectively than text or tables, especially to non-technical audiences.
 Making data-driven decisions: By understanding data through visualizations, decision-
makers can make informed choices based on evidence.

Common data visualization techniques include:

 Line charts: Show trends over time.


 Bar charts: Compare values across categories.
 Scatter plots: Display relationships between two variables.
 Histograms: Show the distribution of a single variable.
 Pie charts: Represent proportions of a whole.
 Maps: Visualize geographic data.
 Heatmaps: Show the intensity of data across a grid.
 Network diagrams: Represent connections between entities.

Data visualization plays a key role in data science and analysis. It enables us to grasp datasets
by representing them. Matplotlib, a known Python library offers a range of tools, for
generating informative and visually appealing plots and charts. One outstanding feature of

1
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

Matplotlib is its user-versatile interface called Pyplot API, which simplifies the process of
creating plots.
Define Matplotlib.
Matplotlib is a python library it used to visualizations in python. It’s used for creating
static, animated, and interaction visualization in python.
Which library is used for Matplotlib?
o Numpy library and is a core part of the spicy short –a group of scientific
computing tools for python.
o A panda is a library used by matplotlib mainly for data manipulation and
analysis.
Simple example coding for matplotlib:
Import matplotlib.pyplot as plt
Figures and Subplots in Matplotlib
Figures in Matplotlib are the top-level containers for all plot elements, such as axes, lines, and
text. A figure can contain multiple subplots.
Subplots are individual plotting areas within a figure. They are typically arranged in a grid-like
structure.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
plt.show()

2
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

Colors

 Basic Colors: You can specify colors by name (e.g., 'red', 'blue', 'green').
 Hex Codes: You can use hex codes for colors (e.g., '#FF5733' for a specific shade of
orange).
 Shorthand Notations: Some colors have shorthand notations (e.g., 'r' for red, 'g' for
green, 'b' for blue).
 Example:
plt.plot(x, y, color='red') # Line will be red
plt.plot(x, y, color='#FF5733') # Line will be a specific orange shade

Markers

 Purpose: Markers highlight individual data points on a plot.


 Common Marker Styles:
o '.': Point
o 'o': Circle
o '^': Triangle up
o 's': Square
o 'x': Cross
 Example:
plt.plot(x, y, marker='o') # Circle markers at each data point
plt.plot(x, y, marker='x') # Cross markers at each data point

Line Styles

 Purpose: Line styles determine the appearance of lines in plots.


 Common Line Styles:
o '-': Solid line
o '--': Dashed line
o '-.': Dash-dot line
o ':': Dotted line
plt.plot(x, y, linestyle='--') # Dashed line
plt.plot(x, y, linestyle='-.') # Dash-dot line

Combining Colors, Markers, and Line Styles

 You can combine all three in a single plot command:


plt.plot(x, y, color='green', marker='o', linestyle='--') # Green dashed line with circle
markers

3
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

 Shorthand Notation: You can use a single string to combine them (e.g., 'go--' for a green
dashed line with circle markers):
plt.plot(x, y, 'go--') # Equivalent to color='green', marker='o', linestyle='--'

Ticks

 Purpose: Ticks are the markers along the axes that indicate data values.
 Customizing Ticks:
o plt.xticks() and plt.yticks(): Set custom tick positions and labels on the x-axis and
y-axis.
o Example:
plt.xticks([0, 1, 2, 3], ['A', 'B', 'C', 'D']) # Custom labels on the x-axis
plt.yticks([10, 20, 30, 40], ['Low', 'Medium', 'High', 'Very High']) # Custom
labels on the y-axis

 Rotating Ticks: Rotate tick labels for better readability.


plt.xticks(rotation=45) # Rotate x-axis labels by 45 degrees

Labels

 Purpose: Labels describe the data being plotted on each axis.


 Adding Labels:
o plt.xlabel(): Adds a label to the x-axis.
o plt.ylabel(): Adds a label to the y-axis.
o Example:
plt.xlabel('Time (hours)') # Label for x-axis
plt.ylabel('Temperature (°C)') # Label for y-axis

 Font Size and Style: You can customize the font size and style of the labels.
plt.xlabel('Time (hours)', fontsize=14, fontweight='bold')
plt.ylabel('Temperature (°C)', fontsize=14, fontstyle='italic')

Legends

 Purpose: Legends describe the different data series or categories in the plot.
 Adding a Legend:
o plt.legend(): Adds a legend to the plot. The legend labels are usually taken from
the label argument in the plot() function.
o Example:
plt.plot(x1, y1, label='Dataset 1')

4
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

plt.plot(x2, y2, label='Dataset 2')


plt.legend() # Add a legend to distinguish between the datasets

 Customizing the Legend:


o Location: Control where the legend appears (e.g., 'upper right', 'lower left').
plt.legend(loc='upper left')
Font Size: Adjust the size of the legend text.
plt.legend(fontsize='large')

o Title: Add a title to the legend.


plt.legend(title='Legend Title')

Annotations

 Purpose: Annotations are used to add text labels to specific points in the plot, often to
provide additional information or highlight important data points.
 Adding Annotations:
o plt.annotate(): Adds text to a specific point on the plot.
o Basic Usage:
plt.annotate('Important Point', xy=(x_value, y_value), xytext=(x_offset, y_offset),
arrowprops=dict(facecolor='black', arrowstyle='->'))

 xy=(x_value, y_value): Coordinates of the point being annotated.


 xytext=(x_offset, y_offset): Position of the text relative to the point.
 arrowprops: Defines the properties of the arrow connecting the text to the
point.
o Example:
plt.plot(x, y)
plt.annotate('Peak', xy=(3, 10), xytext=(4, 15),arrowprops=dict(facecolor='blue',
arrowstyle='->'))

Drawing Shapes on a Subplot

 Purpose: Drawing shapes like lines, rectangles, circles, or polygons on a subplot can help
to highlight specific areas or patterns in the data.
 Common Shapes:
o Line: Use plt.axhline(), plt.axvline(), or plt.plot() to draw horizontal, vertical, or
diagonal lines.

5
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

plt.axhline(y=5, color='red', linestyle='--') # Horizontal line at y=5


plt.axvline(x=2, color='green', linestyle=':') # Vertical line at x=2

o Rectangle: Use plt.gca().add_patch() to add a rectangle.


import matplotlib.patches as patches

fig, ax = plt.subplots()
ax.plot(x, y)
rect = patches.Rectangle((2, 4), 2, 3, linewidth=2, edgecolor='r', facecolor='none')
ax.add_patch(rect) # Adds a rectangle starting at (2,4) with width 2 and height 3

o Circle: Use plt.gca().add_patch() with patches.Circle.


circle = patches.Circle((4, 5), radius=1, linewidth=2,
edgecolor='blue',facecolor='none')

ax.add_patch(circle) # Adds a circle centered at (4,5) with a radius of 1

o Polygon: Use plt.gca().add_patch() with patches.Polygon.


polygon = patches.Polygon([[1, 2], [3, 4], [5, 1]], closed=True,
edgecolor='purple')
ax.add_patch(polygon) # Adds a polygon with specified vertices

Combining Annotations and Shapes

 You can combine annotations with shapes to provide context or further highlight specific
areas:
fig, ax = plt.subplots()
ax.plot(x, y)
ax.add_patch(patches.Rectangle((2, 4), 2, 3, linewidth=2, edgecolor='r',
facecolor='none'))
ax.annotate('Highlighted Area', xy=(3, 5.5), xytext=(4,
7),arrowprops=dict(facecolor='black', arrowstyle='->'))

Working with Multiple Subplots

 Adding Annotations and Shapes: You can add annotations and shapes to individual
subplots by referencing the specific Axes object.
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(x1, y1)

6
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

ax2.plot(x2, y2)
ax1.annotate('Point A', xy=(2, 5), xytext=(3, 7), arrowprops=dict(facecolor='black',
arrowstyle='->'))
ax2.add_patch(patches.Circle((4, 5), radius=0.5, color='green'))
Plotting function in pandas

1. Line Plot (Default)

 A line plot is the default plot type when using .plot() without specifying kind.
 Example:
import matplotlib.pyplot as plt
import pandas as pd

data = {'Year': [2020, 2021, 2022, 2023],


'Sales': [200, 250, 300, 350]}
df = pd.DataFrame(data)

df.plot(x='Year', y='Sales')
plt.show()

2. Bar Plot

 Creates a bar chart, which is useful for comparing quantities across different categories.
 Example:
df.plot(kind='bar', x='Year', y='Sales', color='skyblue')
plt.show()

3. Histogram

 A histogram is used to display the distribution of a dataset.


 Example:
df['Sales'].plot(kind='hist', bins=5, color='purple')
plt.show()

4. Scatter Plot

 A scatter plot displays individual data points to show relationships between two
variables.
 Example:
df.plot(kind='scatter', x='Year', y='Sales', color='red')
plt.show()

7
K.M.G. COLLEGE OF ARTS & SCIENCE(AUTONOMOUS)
DEPARTMENT OF DATA SCIENCE
E-NOTES : FUNDAMENTALS OF DATA SCIENCE

5. Box Plot

 A box plot (or box-and-whisker plot) shows the distribution of data based on a five-
number summary.
 Example:
df.plot(kind='box')
plt.show()

6.Pie Chart

 A pie chart shows proportions of a whole. Typically used with a Series or one column of
a DataFrame.
 Example:
df.set_index('Year')['Sales'].plot(kind='pie', autopct='%1.1f%%')
plt.ylabel('') # Remove the default y-label
plt.show()

You might also like