0% found this document useful (0 votes)
13 views25 pages

Unit 5

This document covers data visualization techniques using Matplotlib and Seaborn, including various plot types such as line plots, scatter plots, histograms, and error visualizations. It discusses the advantages and disadvantages of data visualization, as well as methods for creating multiple subplots and three-dimensional plots. Additionally, it highlights the importance of visualizing data distributions and relationships to identify trends and patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views25 pages

Unit 5

This document covers data visualization techniques using Matplotlib and Seaborn, including various plot types such as line plots, scatter plots, histograms, and error visualizations. It discusses the advantages and disadvantages of data visualization, as well as methods for creating multiple subplots and three-dimensional plots. Additionally, it highlights the importance of visualizing data distributions and relationships to identify trends and patterns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT-5

Visualization with Matplotlib : Simple Line plots, Scatter


plots, Visualizing errors, Density and Contour plots,
Histograms, Binnings, Multiple subplots, Three-
dimensional plotting with Matplotlib
,
Geographic data with Basemap, Visualization with
Seaborn
Data visualization
Data visualization is the graphical representation of information and
data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.

Some other advantages of data visualization include:


Easily sharing information.
Interactively explore opportunities.
Visualize patterns and relationships.
• Some other disadvantages include:

• Biased or inaccurate information.


• Correlation doesn’t always mean causation.
• Core messages can get lost in translation.
• Importing Matplotlib

Setting Styles
We will use the plt.style directive to choose appropriate aesthetic styles
for our figures.
Simple Line Plots
• Perhaps the simplest of all plots is the visualization of a single
function $y = f(x)$.

• Here we will take a first look at creating a simple plot of this type. As
with all the following sections, we'll start by setting up the notebook
for plotting.
simple line plots
• A line plot shows how something changes over time or in relation to
another variable. It connects data points with straight lines, making it
easy to spot trends.
2. Bar Plot
• Shows quantities for different categories using bars.
• Comparing values across categories.
• Example: Sales by product type.

plt.bar(['Apples', 'Bananas', 'Oranges'], [50, 75, 30])

3. Histogram
• Shows the distribution of numerical data by grouping into bins.
• You want to understand the frequency of data ranges.
• Example: Distribution of students’ scores in a test.

plt.hist([55, 61, 70, 70, 72, 85, 90, 95, 95, 95], bins=5)
4. Scatter Plot
• Plots individual points to show the relationship between two variables.
• You're analyzing correlation or patterns.
• Example: Hours studied vs. test score.

plt.scatter([1, 2, 3, 4], [50, 60, 70, 80])

5. Box Plot (Box-and-Whisker Plot)


Summarizes data with median, quartiles, and outliers.
You want to visualize the spread and skew of your data.
Compare salary distributions for different departments.

psns.boxplot(data=[ [30000, 35000, 40000], [50000, 52000, 58000] ])


Different Plots
Line Plot :
• Shows trends over time by connecting points with lines.
• You want to see how something changes over time.

Example: Temperature across 7 days.


plt.plot([1, 2, 3], [10, 15, 13])
• 6. Pie Chart
• A circular chart showing proportions as slices.
• You want to show percentage breakdowns.
• Market share by brand.
• plt.pie([40, 30, 20, 10], labels=['Brand A', 'B', 'C', 'D'], autopct='%1.1f
%%')

7. Heatmap
• A color-coded matrix that shows correlations or frequencies.
• You're visualizing a grid of values (e.g., correlations).
• Correlation between different features in a dataset.
import seaborn as sns
import numpy as np
data = np.random.rand(5,5)
sns.heatmap(data, annot=True)

8. Pair Plot
• Multiple scatter plots in a grid to show relationships in multi-variable data.
• You’re exploring relationships between several features.
• Iris dataset (petal length, width, species, etc.)

sns.pairplot(sns.load_dataset('iris'), hue='species')
Simple Scatter Plots
• A scatter plot is a graph with points (dots) plotted on a horizontal (x-
axis) and vertical (y-axis) to show how much one variable is affected
by another.

Purpose of a Scatter Plot

• To observe relationships/correlations between two variables (e.g.,


height vs weight, hours studied vs exam score)
• To identify trends, such as positive, negative, or no correlation
• To spot outliers (data points that don't fit the pattern)
• To visualize clusters or groupings of data
Choose two variables you want to compare.

• Example: Study hours (X-axis) vs Test score (Y-axis)


Plot the data as points on the graph, each point representing one
observation.
(x₁, y₁), (x₂, y₂), etc.
Look for patterns:
Upward trend → Positive correlation
Downward trend → Negative correlation
Random spread → No correlation
Visualizing Errors
When you plot data, you often want to answer:

• How reliable are these points?

• Is the variation random noise or a real pattern?

• How confident are we in the measurements?

• Error bars help answer that by showing how much the data might vary.
They usually represent uncertainty in measurements or predictions. This could
be:

• Standard deviation (how spread out the data is)


• Standard error (uncertainty of the mean)
• Confidence intervals
• Measurement precision

plt.errorbar(x, y, yerr=dy, fmt='.k');

"Here are the observed data points (y), but each has a possible error of ±0.8."
The vertical lines (error bars) show that range.
• Points = the measured or predicted values.
• Vertical lines = the possible range above and below each point, due to
noise or uncertainty.
• If error bars don’t overlap between groups/points → strong evidence
of difference.
• If they do overlap → differences might be due to noise.
Types of error bars
• Symmetric Vertical Error Bars : All points have the same uncertainty
range above and below.

• Asymmetric Vertical Error Bars : Each point has a different upper and
lower error.

• Horizontal Error Bars : Error is shown in the x-direction instead of y.


Density and Contour Plots
Sometimes it is useful to display three-dimensional data in two
dimensions using contours or color-coded regions. There are three
Matplotlib functions that can be helpful for this task: plt.contour for
contour plots, plt.contourf for filled contour plots, and plt.imshow for
showing images
• Density Plot (2D Histogram or KDE)
• Shows how data points are concentrated in different areas of a 2D space.
• Think of it like a heatmap of how many points are in different regions.
• Often created with kernel density estimation (KDE) or 2D histograms.

• 🔸 Contour Plot
• A topographic map-style plot.
• Shows lines of constant density — each contour line encloses an area with
equal density.
• It's like slicing a 3D density surface into 2D layers.
Histograms, Binnings, Multiple subplots
• A histogram is a graphical representation of the distribution of numerical data. It's an estimate
of the probability distribution of a continuous variable and was first introduced by Karl Pearson.

Key characteristics:
• Consists of adjacent rectangles (bins)
• The area of each rectangle represents the frequency of data points in that bin
• X-axis represents the range of data values divided into intervals
• Y-axis represents the frequency or count of observations in each interval

Purpose:
• Visualize data distribution
• Identify patterns (normal, skewed, bimodal distributions)
• Spot outliers
• Understand data spread and central tendency
Binning
Binning (or bucketing) is the process of dividing the range of values in a dataset into a series
of intervals (bins) for histogram creation or data discretization.
Key aspects:Bin Width/Size: The range of values each bin covers
• Number of Bins: How many intervals to create
• Bin Edges: The boundaries between bins
Common binning strategies:
• Fixed-width binning: Equal-sized intervals (e.g., 0-10, 10-20, 20-30)
• Variable-width binning: Unequal intervals based on data density
• Square-root choice: Number of bins = √n (n = number of data points)Impact of binning:
• Too few bins may oversimplify the distribution
• Too many bins may show too much noise
• Optimal binning reveals the true underlying distribution
Multiple subplots
Multiple subplots (or small multiples) are arrangements of several plots in a
single figure to enable comparison across different variables or categories.

Common types of subplot arrangements:


• Grid layout (rows × columns)
• Shared axis subplots (aligned scales for comparison)
• Faceted plots (by category variables)

Benefits:
• Compare distributions across categories
• Visualize relationships between multiple variables
• Save space while showing comprehensive information
Three-dimensional plotting with
Matplotlib

You might also like