0% found this document useful (0 votes)
39 views81 pages

Unit 5

The document provides an overview of data visualization, highlighting its importance in simplifying complex data and aiding decision-making across various industries. It discusses various visualization tools and techniques, including Matplotlib, Seaborn, and interactive tools like Tableau, while also covering key concepts such as exploratory vs. explanatory visualization and the use of visual variables. Additionally, it includes practical examples of creating different types of plots using Python libraries.

Uploaded by

akhilakrosuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views81 pages

Unit 5

The document provides an overview of data visualization, highlighting its importance in simplifying complex data and aiding decision-making across various industries. It discusses various visualization tools and techniques, including Matplotlib, Seaborn, and interactive tools like Tableau, while also covering key concepts such as exploratory vs. explanatory visualization and the use of visual variables. Additionally, it includes practical examples of creating different types of plots using Python libraries.

Uploaded by

akhilakrosuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Unit – V

Data Visualization

Introduction

Matplotlib, Line Plots, Area Plots, Histograms

Bar Charts, Pie Charts, Box Plots, Scatter Plots, Bubble Plots

Advance Visualization Tools

Waffle Charts, Word Clouds, Seaborn and Regression Plots

Creating Maps & Visualizing Geospatial Data – Folium, Maps with Markers, Choropleth
Maps.

An Introduction

Data visualization is the graphical representation of information and data. By


using visual elements like charts, graphs, and maps, data visualization tools help in
making complex data more accessible, understandable, and usable.

Visualizing data is a crucial part of data analysis as it helps people uncover insights that
may be hidden in raw tables.

Data visualization tools help in creating visual representations of the data that can
highlight trends, patterns, correlations, and outliers that would be harder to see in a
dataset presented in raw form.

The Need for Data Visualization Tools

1. Simplification of Complex Data: Data visualization tools simplify complex


datasets into visual formats such as graphs or charts. This allows decision-makers
to quickly understand the data and make informed decisions.
2. Identifying Trends and Patterns: With large datasets, spotting trends manually
can be tedious and error-prone. Data visualization makes it easier to identify
trends, patterns, and correlations by representing them visually.
3. Improved Decision-Making: Visualized data helps stakeholders quickly grasp
insights and can aid in faster, more accurate decision-making.
4. Enhanced Data Interpretation: With visualizations, it’s easier to communicate
findings. Rather than going through lengthy reports or tables, stakeholders can
immediately understand key points with graphs and charts.
5. Engagement and Clarity: Interactive visualizations are engaging and can be used
in presentations, which makes data more accessible to a wider audience, even
those without technical expertise.
6. Support for Analytical Thinking: By making the relationships between data
elements clear, data visualizations foster more in-depth analysis and discussions.

Data Visualization Tools Useful for..

Data visualization tools are used in various industries, including:

 Business and Marketing: To track sales, understand customer behaviour,


monitor performance metrics, and make marketing decisions.
 Healthcare: For visualizing patient data, health trends, and performance
statistics.
 Finance: To track market performance, risk assessments, and portfolio analysis.
 Education: In visualizing student data, performance metrics, and trends over
time.
 Government and Public Services: To represent census data, crime statistics, and
public spending.
 Scientific Research: In academic research to visualize experimental data,
findings, and trends.

Key Concepts in Data Visualization:

Exploratory vs. Explanatory Visualization:

• Exploratory Visualization: Used for data exploration and analysis, helping to


identify patterns or anomalies in the data.

• Designed to communicate specific insights or findings to a broader audience.


Visual Variables:

• Visual variables such as color, size, shape, and position are used to encode data
attributes. For example, a scatter plot might use the x-axis and y-axis to represent two
variables and color to represent a third variable.

Types of Visualizations:

• Charts and Graphs: Line charts, bar charts, pie charts, histograms, scatter plots,
etc.

• Maps and Geospatial Visualizations: Representing data on maps for spatial


analysis.

• Infographics: Combining text, visuals, and data to tell a story.

• Dashboards: Interactive displays that allow users to explore data.

Storytelling with Data:

• Effectively using visualizations to tell a compelling story that leads the viewer
through key insights.

Color Theory:

• Understanding how to use color effectively in visualizations to highlight,


categorize, or differentiate data.

Data Integrity and Accuracy:

• Ensuring that visualizations accurately represent the underlying data and avoid
misleading interpretations.

Types of Data Visualization Tools

There are different types of data visualization tools, each serving a unique purpose.

Broadly, they can be divided into two categories:


1. Static Visualization Tools: These are tools that create simple, non-interactive
visualizations that can be printed or presented as images.
o Charts and Graphs: Bar charts, line graphs, scatter plots, pie charts, etc.
o Infographics: Combining text, images, and data visualizations for
storytelling.

Some tools for static visualizations include:

o Microsoft Excel: Basic charting tools and graphs.


o Google Sheets: Offers some basic graphing options.
o Tableau (Static mode): Although Tableau is often interactive, it can also
be used to generate static visualizations.
2. Interactive Visualization Tools: These tools allow users to interact with the
data, such as filtering, zooming, or hovering over different parts of the
visualization for additional information.

Some popular tools for interactive visualizations include:

o Tableau: For creating interactive dashboards and charts.


o Power BI: A Microsoft tool for business analytics that allows for interactive
reports and dashboards.
o QlikView: A business intelligence tool that provides interactive data
exploration.
o D3.js: A JavaScript library for producing interactive, web-based
visualizations.

Data Visualization Libraries in Python

Python is one of the most popular languages for data analysis, and it offers several
libraries for data visualization.

Below are some key libraries used for visualizing data:


Matplotlib

o One of the oldest and most widely used Python libraries for static,
animated, and interactive visualizations.
o Provides a variety of plot types, including line plots, bar charts, histograms,
scatter plots, and more.
o It allows you to control all aspects of the plot, such as axes, titles, labels, and
legends.

Key Features of Matplotlib

1. Support for Various Plot Types:


• Matplotlib supports a wide range of plot types, including line plots, scatter
plots, bar plots, histograms, pie charts, 3D plots, and more.
2. Customization and Styling:
• Users can customize virtually every aspect of a plot, including colors, line
styles, markers, fonts, labels, and annotations.
3. Publication-Quality Output:
• Matplotlib produces high-quality visualizations suitable for inclusion in
research papers, articles, and presentations.
4. Multiple Backends:
• Matplotlib supports various backends for rendering plots, allowing users
to choose between different output formats such as PNG, PDF, SVG, and interactive
environments like Jupyter Notebooks.
5. Seamless Integration with NumPy:
• Matplotlib is designed to work seamlessly with NumPy arrays, making it
convenient for visualizing numerical data.
6. Interactive Mode:
• Matplotlib can be used interactively in IPython or Jupyter environments,
allowing users to explore and modify plots dynamically.

Matplotlib Basics:

Matplotlib is typically used in conjunction with NumPy for handling numerical data. The
basic steps to create a simple plot involve:
1. Importing Matplotlib:

• import matplotlib.pyplot as plt

Creating Data:

Define your data using NumPy arrays or lists.

Creating a Plot:

• Use Matplotlib functions to create a figure and one or more axes (subplots)

Examples:

# Line plot
plt.plot(x, y)
# Scatter plot
plt.scatter(x, y)
# Bar plot
plt.bar(x, height)
# Histogram
plt.hist(data, bins=10)
2. Adding Labels and Titles:
• Add labels to the axes and a title to the plot for clarity.
plt.xlabel('X-axis Label')
plt.ylabel('Yaxis Label')
plt.title('Plot Title')
3. Displaying the Plot:
• Use plt.show()
to display the plot in a standalone script or non-interactive mode.
• In Jupyter Notebooks, %matplotlib inline may be used to display plots inline.
Example:
Here's a simple example of creating a line plot using Matplotlib:
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 2 * np.pi, 100)


y = np.sin(x)
plt.plot(x, y)

# Add labels and title


plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Sine Wave')

# Display the plot


plt.show()

Output:

This code creates a plot of a sine wave using the matplotlib library.

For better understanding:

1. x = np.linspace(0, 2 * np.pi, 100)


 np.linspace(start, stop, num) is a function from the NumPy library that returns
evenly spaced numbers over a specified range.
 Arguments:

0: The starting value of the range (0 in this example).

2 * np.pi: The stopping value of the range (2π, which is approximately 6.2832).
This corresponds to one full period of the sine wave.

100: The number of values to generate between 0 and 2π (inclusive).

This line generates 100 values between 0 and 2π, which will serve as the x-
coordinates for plotting the sine wave.

2. y = np.sin(x)

 np.sin(x) computes the sine of each value in the x array.

This line calculates the sine values corresponding to each x value. The
resulting y array will contain the sine of all 100 x values.

Since the sine function is periodic, this array will represent the y-coordinates for
the sine wave.

3. plt.plot(x, y)

 plt.plot(x, y) is a function from matplotlib.pyplot that plots y versus x on a graph.

This line creates the sine wave plot, where the x-values are plotted on the
horizontal axis (X-axis), and the sine values (y-values) are plotted on the vertical
axis (Y-axis).

4. plt.xlabel('X-axis')

 This function sets the label for the X-axis.

It adds the label "X-axis" to the horizontal axis.


5. plt.ylabel('Y-axis')

 This function sets the label for the Y-axis.

It adds the label "Y-axis" to the vertical axis.

6. plt.title('Sine Wave')

 This function sets the title of the plot.

It adds the title "Sine Wave" at the top of the plot.

7. plt.show()

 This function displays the plot in the output window.

It renders the plot and shows the sine wave on the screen with the specified
labels and title.

Visual Representation:

The resulting plot will display a sine wave, showing the relationship between x (the
input angle) and y (the sine of that angle). The x-values range from 0 to 2π, and the sine
function oscillates between -1 and 1, creating a smooth periodic curve.

Final Plot:

 The X-axis will represent the angle (from 0 to 2π).


 The Y-axis will represent the sine of each angle.
 The plot will display one full cycle of the sine wave.

Installation:
If you haven't installed Matplotlib yet, you can install it using: bash
pip install matplotlib
Basic Line Plot:

import matplotlib.pyplot as plt


import numpy as np
# Create data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create a simple line plot
plt.plot(x, y)
# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')
# Show the plot
plt.show()

This script creates a basic line plot of the sine function.


Output:

import matplotlib.pyplot as plt


x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y)
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
Output:

Line Plot

A line plot (also called a line chart) is used to display data points in a sequence,
typically over time. It connects the individual data points with a line, making it ideal for
showing trends or patterns in data.

When to Use a Line Plot:

 Trends over Time: When you want to visualize how a variable changes over
time

e.g., stock prices, temperature over the year, sales trends.

Example: You might use a line plot to show monthly sales figures over the past year
or the change in temperature over several weeks.
Continuous Data: Line plots are best for continuous data where each data point is
connected to its predecessor.

Relationships: To show relationships between two variables, especially when you


expect one variable to depend on the other (i.e., y increases/decreases as x increases).

Example: Showing the relationship between distance and time for an object
moving at a constant speed.

import matplotlib.pyplot as plt


import numpy as np

# Create some sample data (e.g., monthly sales over a year)


months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
sales = [250, 320, 400, 480, 510, 600, 720, 800, 750, 710, 650, 600]

plt.plot(months, sales, marker='o')


plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales Trend')
plt.show()

Output:
Scatter Plot:

import matplotlib.pyplot as plt


import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()
A scatter plot is used to show the relationship between two numerical variables
by plotting individual data points as dots. It helps you visualize the correlation,
distribution, or trends between variables.

When to Use a Scatter Plot:

 Correlation Between Variables: When you want to explore the relationship


between two variables

e.g., does one variable increase as the other increases?

Example: You can use a scatter plot to check the relationship between hours
studied and exam scores.

 Outliers and Clusters: Scatter plots help identify outliers (data points that don't
fit the pattern) or clusters (groups of data points).
 Non-linear Relationships: Scatter plots are particularly useful when you don’t
know if the relationship between variables is linear, and you just want to see the
spread of data points.
o Example: You might use a scatter plot to see if there’s any correlation
between age and income.
# Generate some example data
np.random.seed(0)
x = np.random.rand(50) * 10 # Random x values
y = 2 * x + 5 + np.random.randn(50) * 2 # Linear relationship with some noise

# Scatter plot
plt.scatter(x, y)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot: X vs Y')
plt.show()

Output:

import matplotlib.pyplot as plt


import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()

Output:

Bar Plot:

import matplotlib.pyplot as plt


import numpy as np
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])
plt.bar(x,y )
plt.show()

Output:
A bar plot (also called a bar chart) is used to compare discrete categories of data
by using rectangular bars with lengths proportional to the values they represent. Bar
plots are especially useful for comparing quantities across different groups or
categories.

When to Use a Bar Plot:


 Categorical Data Comparison: Use a bar plot when you have categorical data
e.g., comparing sales across different products or regions.
 Discrete Data: Bar plots are ideal when you want to display and compare
distinct groups.
 Non-time Series: Unlike line plots, bar plots don't require the data to be ordered
or continuous. You can compare categories without any particular sequence.
import matplotlib.pyplot as plt
# Sample data (e.g., sales of different products)
products = ['Product A', 'Product B', 'Product C', 'Product D']
sales = [250, 320, 400, 480]
# Create a bar plot
plt.bar(products, sales, color='skyblue')
# Add labels and title
plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Sales Comparison of Products')
# Display the plot
plt.show()

Output:

Key Use Cases for Bar Plots:


 Comparing sales of different products or categories.
 Comparing revenue across various business units.
 Showing the number of items in different categories (e.g., number of users in
different regions).
 Comparing the performance of different teams or departments.

Area Plot:

import matplotlib.pyplot as plt


import numpy as np
# Create data
x = np.linspace(0, 10, 100)
y1 = x
y2 = x**2
y3 = x**3
# Create an area plot
plt.fill_between(x, y1, label='Linear', alpha=0.5)
plt.fill_between(x, y2, label='Quadratic', alpha=0.5)
plt.fill_between(x, y3, label='Cubic', alpha=0.5)
# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Area Plot')
# Add a legend
plt.legend()
# Show the plot
plt.show()

Output:
An area plot is a type of plot that displays the data along with the area between
the curve and the x-axis, often used to show the cumulative total of the data. It is
particularly useful when you want to visualize the relationship between quantities over
time and highlight the magnitude of the change.

When to Use an Area Plot:


 Cumulative Data: Use an area plot when you want to display cumulative or
stacked data, showing the total value over time or categories.
 Trends Over Time: An area plot can show trends, and is particularly useful
when comparing the total magnitude of multiple time-series data or to show how
each part contributes to the total.
 Part-to-whole Relationship: When you want to see how individual categories
contribute to a whole (stacked area plot).

import matplotlib.pyplot as plt


import numpy as np
# Sample data (e.g., sales over time)
x = np.linspace(0, 10, 10)
y1 = np.sin(x) + 1
y2 = np.cos(x) + 1

# Create an area plot


plt.fill_between(x, y1, color="skyblue", alpha=0.4, label="Sales A")
plt.fill_between(x, y2, color="orange", alpha=0.4, label="Sales B")

# Add labels and title


plt.xlabel('Time')
plt.ylabel('Sales')
plt.title('Area Plot Showing Sales Over Time')
plt.legend()

# Display the plot


plt.show()
Output:

An area plot that shows the cumulative values of two datasets (Sales A and Sales B) over
time. The area under each curve is filled with color, showing how each series
contributes to the total at each point.

Key Use Cases for Area Plots:

 Showing changes in the cumulative total over time.


 Visualizing multiple time-series and comparing their trends.
 Highlighting the magnitude of change over time or categories.
 Comparing different groups or categories stacked on top of each other (stacked
area plot).

Piechart

A pie chart is used to show the proportion or percentage of categories in a


whole. Each slice of the pie represents a category, and the size of each slice corresponds
to the proportion of that category in relation to the entire dataset.
import matplotlib.pyplot as plt

# Sample data (e.g., market share of companies)


labels = ['Company A', 'Company B', 'Company C', 'Company D']
sizes = [30, 40, 20, 10]

# Create a pie chart


plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.title('Market Share of Companies')
plt.show()

Output:

When to Use a Pie Chart:

 Proportional Data: When you want to visualize the relative sizes of parts that
make up a whole.
 Category Comparison: When you have a small number of categories (ideally 2
to 6) and want to see how each category contributes to the total.
 Parts of a Whole: When the sum of the data points is important, and you want to
emphasize how each category contributes to the total.

Example Use Cases for Pie Charts:

 Market Share: Showing the market share of different companies in an industry.


 Budget Allocation: Representing how a company's budget is divided among
different departments.
 Survey Results: Showing the percentage breakdown of survey responses across
different categories.

Box Plots

Box Plot (Box-and-Whisker Plot)

A box plot is used to represent the distribution of data based on a five-number


summary (minimum, first quartile, median, third quartile, and maximum) and to
visualize the spread and outliers in the dataset.

import matplotlib.pyplot as plt


import numpy as np

# Sample data (e.g., exam scores of students in different groups)


data = [np.random.normal(70, 10, 100), np.random.normal(80, 15, 100),
np.random.normal(90, 5, 100)]

# Create box plot


plt.boxplot(data, labels=['Group 1', 'Group 2', 'Group 3'])
plt.title('Box Plot: Exam Scores Comparison')
plt.ylabel('Scores')
plt.show()

Output:
When to Use a Box Plot:

 Data Distribution: When you want to show the spread of data and identify the
central tendency and variability.
 Outliers Detection: Useful for detecting outliers or extreme values in a dataset.
 Comparing Multiple Groups: When comparing distributions of multiple groups
(e.g., comparing test scores across different groups).
 Skewness and Symmetry: When you want to check if the data is symmetric or
skewed (left-skewed or right-skewed).

Example Use Cases for Box Plots:

 Salary Distribution: Showing the distribution of salaries across different


departments in a company.
 Exam Scores: Comparing the distribution of test scores across different schools.
 Stock Price Volatility: Showing the distribution of stock prices over time.
Histograms

A histogram is used to show the frequency distribution of a dataset. It groups data into
bins (intervals) and displays the number of data points that fall into each bin.

import matplotlib.pyplot as plt


import numpy as np

# Sample data (e.g., ages of individuals in a group)


data = np.random.normal(30, 5, 1000)

# Create histogram
plt.hist(data, bins=20, color='skyblue', edgecolor='black')
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

Output:
When to Use a Histogram:

 Frequency Distribution: When you want to visualize the distribution of data


points across different ranges.
 Continuous Data: Useful for continuous data that can take on many values
within a given range.
 Identifying Patterns: When you want to identify the underlying frequency
pattern in the dataset (e.g., normal distribution, skewness).

Example Use Cases for Histograms:

 Income Distribution: Showing the distribution of incomes within a population.


 Height Distribution: Representing the distribution of heights within a sample.
 Age Distribution: Showing how age is distributed in a group of people.

Bubble Plots

A bubble plot is a variation of a scatter plot, where each point is represented as a


bubble. In addition to the x and y coordinates, each bubble has a size (and sometimes a
color) to represent an additional dimension of data.
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.random.rand(20) * 10 # Random x values
y = np.random.rand(20) * 10 # Random y values
sizes = np.random.rand(20) * 1000 # Bubble sizes

# Create a bubble plot


plt.scatter(x, y, s=sizes, alpha=0.5, color='green')
plt.title('Bubble Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
Output:
When to Use a Bubble Plot:

 Multidimensional Data: When you have three (or more) dimensions to


visualize. A bubble plot can represent data with x and y positions, along with
bubble size and/or color to represent additional data dimensions.
 Comparing Groups: When you want to compare different categories or groups
with respect to two continuous variables, while also representing another
feature with bubble size or color.
 Trends with Size Impact: When you want to visualize how the size of one
variable affects the relationship between two other variables.

Example Use Cases for Bubble Plots:

 Company Revenue vs Employees: A bubble plot could represent companies,


where the x-axis shows the number of employees, the y-axis shows revenue, and
the bubble size represents the market share.
 Product Sales: Visualizing the relationship between advertising spend (x-axis),
product sales (y-axis), and product category size (bubble size).
 Stock Performance: Showing the correlation between the stock price and the
market cap of companies, with bubble size representing the total market
capitalization.

Seaborn

o Built on top of Matplotlib, Seaborn is a powerful library for statistical data


visualization.
o It simplifies complex visualizations, with built-in themes and color palettes.
o Seaborn makes it easy to create heatmaps, box plots, violin plots, and pair
plots.

import seaborn as sns


sns.set(style="darkgrid")
data = sns.load_dataset("tips")
sns.barplot(x="day", y="total_bill", data=data)
plt.show()
Output:

Plotly

o A library for creating interactive, web-based visualizations.


o It supports a wide range of chart types and is particularly useful for
dashboards.
o It is compatible with web frameworks like Dash for creating fully
interactive applications.

import plotly.express as px
data = px.data.iris()
fig = px.scatter(data, x="sepal_width", y="sepal_length", color="species")
fig.show()

Bokeh

o Another interactive visualization library, primarily used for creating


dashboards and web applications.
o Provides high-performance interactivity, allowing the creation of custom,
interactive visualizations.

from bokeh.plotting import figure, show


p = figure(title="Simple Line Plot", x_axis_label="X", y_axis_label="Y")
p.line([1, 2, 3, 4, 5], [2, 3, 5, 7, 11], legend_label="Trend", line_width=2)
show(p)

Altair

o Altair is a declarative statistical visualization library based on Vega and


Vega-Lite.
o It is known for being concise and simple to use, making it suitable for rapid
prototyping.

import altair as alt


import pandas as pd
data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [2, 3, 5, 7, 11]})
chart = alt.Chart(data).mark_line().encode(x='x', y='y')
chart.show()

ggplot (plotnine)

o Inspired by the ggplot2 library in R, plotnine is a Python library for creating


declarative statistical graphics.
o It follows the grammar of graphics, where you can build visualizations
layer by layer.

from plotnine import ggplot, aes, geom_line


data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [2, 3, 5, 7, 11]})
ggplot(data, aes(x='x', y='y')) + geom_line()

Pyplot

o Part of Matplotlib but can be used independently to generate plots using a


simpler interface.
o Often used for quick visualizations, especially in data analysis workflows.

Example of How Data Visualization Tools Help in Data Analysis

Let’s look at a practical example of how data visualization tools can help in data analysis,
using Python's Seaborn and Matplotlib libraries.

Scenario: Analyzing Sales Data

Imagine you are working as a data analyst at a retail company, and you have a dataset
containing sales information for the last year. The dataset includes the following columns:

 Date: Date of the sale


 Product: Product name
 Region: Sales region
 Sales Amount: The dollar value of the sale
 Units Sold: The number of units sold

You want to analyse:

1. The trends in sales over time.


2. Which products are the best sellers.
3. Sales distribution across different regions.
4. Total Sales of best sellers
5. Product wise sales

Step 1: Loading the Data

import pandas as pd
#Sample dataset
data = {
"Date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04", "2024-01-05"],
"Product": ["Product A", "Product B", "Product A", "Product C", "Product B"],
"Region": ["North", "South", "East", "West", "North"],
"Sales Amount": [100, 150, 200, 250, 300],
"Units Sold": [10, 15, 20, 25, 30]
}
df = pd.DataFrame(data)
df["Date"] = pd.to_datetime(df["Date"])

# Ensure Date is in datetime format

Step 2: Visualizing Sales Trends Over Time (Line Plot)

You want to understand how the sales amount has changed over time. A line plot can
help you visualize trends.

import seaborn as sns


import matplotlib.pyplot as plt

# Line plot to show sales trends over time


plt.figure(figsize=(10, 6))
sns.lineplot(x='Date', y='Sales Amount', data=df, marker='o')
plt.title('Sales Amount Over Time')
plt.xlabel('Date')
plt.ylabel('Sales Amount ($)')
plt.xticks(rotation=45)
plt.show()

Output:-
Insight from the Line Plot:

This line plot helps you quickly identify whether there are certain periods where sales
were higher or lower. For example, you may see a peak in sales on certain days or periods
with lower sales, which could be linked to external factors such as holidays, marketing
campaigns, or promotions.

Step 3: Visualizing Best-Selling Products (Bar Plot)

Next, you want to know which products are the best sellers based on sales amount. A
bar plot can provide a clear comparison between different products.

# Bar plot to show sales amount per product


plt.figure(figsize=(10, 6))
sns.barplot(x='Product', y='Sales Amount', data=df)
plt.title('Sales Amount by Product')
plt.xlabel('Product')
plt.ylabel('Sales Amount ($)')
plt.show()

Output:-

Insight from the Bar Plot:

This bar plot allows you to compare the sales performance of each product. From the plot,
you can quickly determine which product generates the most revenue. For example, if
"Product A" shows the highest sales, you might prioritize marketing efforts or stock
management around it.

Step 4: Visualizing Sales Distribution by Region (Box Plot)

You want to understand how sales vary across different regions. A box plot is useful to
visualize the spread of sales values and identify any outliers in the data.

# Box plot to show sales distribution by region


plt.figure(figsize=(10, 6))
sns.boxplot(x='Region', y='Sales Amount', data=df)
plt.title('Sales Amount Distribution by Region')
plt.xlabel('Region')
plt.ylabel('Sales Amount ($)')
plt.show()

Output:-

Insight from the Box Plot:

This box plot helps you identify the distribution of sales across different regions. For
example, if the "North" region has a higher median sales value and fewer outliers, you
might conclude that it’s a more stable and profitable region. On the other hand, if another
region shows more variability (larger spread), you could investigate whether certain
factors like promotions or seasonal changes are impacting sales in that region.
Summary of How Data Visualization Helps

 Trends and Patterns: The line plot helps you identify trends over time. You can
see whether sales are increasing, decreasing, or fluctuating based on specific
periods.
 Comparative Analysis: The bar plot provides a clear comparison between
products. You can identify the best-selling products and allocate resources
accordingly.
 Identifying Outliers and Variability: The box plot shows the distribution of sales
data across regions, highlighting outliers and the spread of sales figures. This helps
in understanding regional differences in performance and focusing on regions that
need improvement.

Additional visualizations, including Area Plots, Histograms, Pie Charts, Box Plots,
Scatter Plots, and Bubble Plots

1. Area Plot

Area plots are useful for visualizing trends over time, similar to line plots but with the
area under the line filled to highlight the magnitude.

Code Example for Area Plot:


df['Month'] = df['Date'].dt.to_period('M')
sales_by_month = df.groupby('Month')['Sales Amount'].sum()

# Plot area chart


plt.figure(figsize=(10, 6))
sales_by_month.plot(kind='area', color='skyblue', alpha=0.4)
plt.title('Sales Trend Over Time (Area Plot)')
plt.xlabel('Month')
plt.ylabel('Sales Amount')
plt.show()

Output:-
Visualization: An area plot of sales trends will fill the area below the line, showing the
volume of sales more clearly.

2. Histogram

Histograms are great for understanding the distribution of a single variable, such as
sales amount or units sold.

Code Example for Histogram:


# Histogram for Sales Amount
plt.figure(figsize=(10, 6))
plt.hist(df['Sales Amount'], bins=20, color='skyblue', edgecolor='black')
plt.title('Distribution of Sales Amount')
plt.xlabel('Sales Amount')
plt.ylabel('Frequency')
plt.show()

Output:-
# Histogram for Units Sold
plt.figure(figsize=(10, 6))
plt.hist(df['Units Sold'], bins=20, color='lightcoral', edgecolor='black')
plt.title('Distribution of Units Sold')
plt.xlabel('Units Sold')
plt.ylabel('Frequency')
plt.show()

Output:
Visualization: Histograms will show the frequency of different ranges of sales amounts
and units sold. You can see whether most sales fall into small or large ranges.

3. Pie Chart

Pie charts are great for showing the percentage distribution of a whole, such as regional
sales contributions.

Code Example for Pie Chart:


# Pie chart for sales distribution across regions
sales_by_region = df.groupby('Region')['Sales Amount'].sum()
plt.figure(figsize=(8, 6))
sales_by_region.plot(kind='pie', autopct='%1.1f%%', startangle=90, cmap='tab20')
plt.title('Sales Distribution Across Regions')
plt.ylabel('') # Remove the default y-label for pie chart
plt.show()

Output:
Visualization: A pie chart will clearly show the proportion of sales from each region as
segments of the pie.

4. Box Plot

Box plots (also called box-and-whisker plots) are useful for visualizing the spread of a
dataset, including the median, quartiles, and outliers.

Code Example for Box Plot:


# Box plot for Sales Amount to detect outliers and spread
plt.figure(figsize=(8, 6))
sns.boxplot(x=df['Sales Amount'], color='skyblue')
plt.title('Sales Amount Distribution (Box Plot)')
plt.xlabel('Sales Amount')
plt.show()
output:

# Box plot for Units Sold to detect outliers and spread


plt.figure(figsize=(8, 6))
sns.boxplot(x=df['Units Sold'], color='lightgreen')
plt.title('Units Sold Distribution (Box Plot)')
plt.xlabel('Units Sold')
plt.show()

Output:
Visualization: The box plot will display the median, interquartile range (IQR), and any
potential outliers in the sales amount and units sold.

5. Scatter Plot

Scatter plots are useful for identifying relationships or correlations between two
numeric variables, such as Units Sold vs Sales Amount.

Code Example for Scatter Plot:


# Scatter plot to see correlation between Units Sold and Sales Amount
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Units Sold', y='Sales Amount', data=df, color='purple')
plt.title('Units Sold vs Sales Amount (Scatter Plot)')
plt.xlabel('Units Sold')
plt.ylabel('Sales Amount')
plt.show()

Output:

Visualization: A scatter plot will allow you to visually assess if there's a relationship
between the number of units sold and the sales amount.

6. Bubble Plot

Bubble plots are an extension of scatter plots where the size of the marker represents a
third variable. This can help visualize the relationship between Units Sold, Sales
Amount, and an additional variable, such as Region.

Code Example for Bubble Plot:


# Bubble plot to show Units Sold vs Sales Amount with bubble size based on Sales
Amount
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Units Sold', y='Sales Amount', size='Sales Amount', sizes=(50, 500),
data=df, hue='Region', palette='Set2')
plt.title('Bubble Plot: Units Sold vs Sales Amount (Bubble size = Sales Amount)')
plt.xlabel('Units Sold')
plt.ylabel('Sales Amount')
plt.legend()
plt.show()

Output:

Visualization: The bubble plot will show the correlation between units sold and sales
amount, with the bubble size representing the total sales amount, and color coding
based on regions.

Subplot

In Matplotlib, the subplot() method is used to create a grid of subplots within a


single figure.

The argument (111) is a shorthand way of specifying the grid layout and the
location of the subplot within the grid.
subplot(111):

 The first digit (1): This specifies the number of rows in the grid.
 The second digit (1): This specifies the number of columns in the grid.
 The third digit (1): This specifies the index of the subplot you want to create in
that grid.

subplot(111) means:

 1 row (only one row).


 1 column (only one column).
 The first (and only) plot in this grid.

Effectively, subplot(111) is just a shorthand way of saying, "Create a single subplot,


which will take up the entire figure."

Example: Creating Multiple Subplots

If you wanted to create multiple subplots (say, a 2x2 grid), you would use the following
code:

import matplotlib.pyplot as plt

# Create a 2x2 grid of subplots


fig = plt.figure(figsize=(10, 8))

# Subplot 1 (top-left)
ax1 = fig.add_subplot(221)

# Subplot 2 (top-right)
ax2 = fig.add_subplot(222)

# Subplot 3 (bottom-left)
ax3 = fig.add_subplot(223)
# Subplot 4 (bottom-right)
ax4 = fig.add_subplot(224)

# Example: Plot something in the first subplot


ax1.plot([1, 2, 3], [4, 5, 6])

# Display the figure


plt.show()

Explanation:

 221: 2 rows, 2 columns, and the first subplot (top-left).


 222: 2 rows, 2 columns, and the second subplot (top-right).
 223: 2 rows, 2 columns, and the third subplot (bottom-left).
 224: 2 rows, 2 columns, and the fourth subplot (bottom-right).

For multiple subplots, you can use subplot(nrows, ncols, index) to specify the grid
dimensions and the position of each subplot.

Advanced Visualization Tools:

For advanced data visualization in Python, consider using libraries like

Plotly -for interactive web-based plots,

Seaborn -for statistical graphics, and

HoloViews -for complex visualizations,

alongside the fundamental Matplotlib for creating a wide range of plots.

Plotly:

Known for its flexibility and ability to create interactive web-based


visualizations.

Supports a wide range of chart types, including 3D plots.


Excellent for creating dashboards and sharing visualizations online.

Seaborn:

Built on top of Matplotlib, Seaborn simplifies the creation of visually appealing statistical
graphics.

Provides high-level abstractions for creating complex plots with minimal code.

Ideal for exploratory data analysis and presenting insights.

HoloViews:

Focuses on building complex visualizations effortlessly by annotating data with semantic


information.

Enables interactive visualizations with minimal code.

Suitable for exploring complex datasets and creating interactive dashboards.

Other Libraries:

Geoplotlib: For creating maps and plotting geographical data.


Folium: For creating interactive maps using Leaflet.js.
Plotnine: A Python library that implements the grammar of graphics from the
R library ggplot2, useful for creating beautiful and complex visualizations.

we can explore Seaborn Waffle Charts,


Word Clouds, and
Regression Plots.

These tools provide powerful ways to represent data visually.

Waffle Charts (Using Seaborn & Matplotlib)

Waffle charts are an excellent way to display proportions and show how a category
contributes to a total.
Waffle charts are a popular alternative to pie charts, representing data proportions as
colored cells in a grid, offering a more visually accessible and easier-to-read
representation, especially when dealing with multiple categories.

While Seaborn doesn't have a direct waffle chart function,

We can create one using a combination of Matplotlib and custom logic.

Waffle Chart Code Example:

import matplotlib.pyplot as plt


import numpy as np
# Data: Distribution of sales across regions
sales_by_region = {
'North': 1000,
'South': 1500,
'East': 1200,
'West': 800
}

# Number of cells in the waffle chart (e.g., 10x10 grid = 100 cells)
total_cells = 100

# Calculate percentage of each region in the grid


regions = list(sales_by_region. keys())
values = list(sales_by_region.values())
total_sales = sum(values)
percentages = [v / total_sales for v in values]

# Create grid for the waffle chart


fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111)

# Number of rows and columns


rows = 10
cols = 10

# Fill cells with colors corresponding to regions


cell_colors = np.array([color for region, color in zip(regions,
plt.cm.Paired.colors)])
cells = np.array([np.tile(c, total_cells) for c in cell_colors])

ax.imshow(cells, aspect='auto', cmap='tab20', interpolation='nearest')


ax.set_xticks([])
ax.set_yticks([])
plt.title('Sales Distribution Across Regions (Waffle Chart)', fontsize=14)
plt.show()

Example: 2
import pandas as pd
import matplotlib.pyplot as plt
from pywaffle import Waffle
# creation of a dataframe
data ={'Fruits': ['Apples', 'Banana', 'Mango', 'Strawberry', 'Orange'], 'stock': [20, 11, 18,
25, 8] }
df = pd.DataFrame(data)
#To plot the waffle Chart
fig = plt.figure(FigureClass = Waffle, columns=10, rows = 5, values = df.stock,
icons='face-smile',labels=list(df.Fruits))
plt.show()
Output:

Example 3

import pandas as pd
import matplotlib.pyplot as plt
from pywaffle import Waffle
# creation of a dataframe
data ={'Fruits': ['Apples', 'Banana', 'Mango', 'Strawberry', 'Orange'], 'stock': [20, 11, 18,
25, 8] }
df = pd.DataFrame(data)
#To plot the waffle Chart
fig = plt.figure(FigureClass = Waffle, columns=10, rows = 5, values = df.stock,
icons='cat',labels=list(df.Fruits))
plt.show()

Output:

Word Clouds

Word Clouds are a popular tool to visualize text data. They can be used to
display the most frequent terms or categories in a visually appealing way. Here, you can
visualize the most frequently sold products or regions, for example.

Word Cloud Code Example:

To generate word clouds, you need the wordcloud package.

If it's not installed,

use:

pip install wordcloud


Create a Word Cloud for the products:

from wordcloud import WordCloud


import matplotlib.pyplot as plt

# Data: Frequency of Products Sold


product_sales = df['Product'].value_counts()

# Generate Word Cloud


wordcloud = WordCloud(width=800, height=400,
background_color='white').generate_from_frequencies(product_sales)

# Display the Word Cloud


plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off') # Turn off axis
plt.title('Word Cloud: Frequency of Products Sold', fontsize=14)
plt.show()

Output:
Example :

# Import necessary libraries

from wordcloud import WordCloud

import matplotlib.pyplot as plt

# Sample text

text = "Python is a great programming language for data analysis and machine learning.
Python is also used for web development."

# Create a WordCloud object

wc = WordCloud(width=800, height=400, background_color="white")

# Generate the word cloud from the text

wc.generate(text)

# Display the word cloud using matplotlib

plt.figure(figsize=(10, 6))

plt.imshow(wc, interpolation='bilinear') # Display the image

plt.axis('off') # Hide axes

plt.show()
Explanation:

1. Text Data: The text text is a simple string that contains a few words. You can
replace this with any text data or dataset you have.
2. WordCloud Object: The WordCloud object is created with specified width,
height, and background color.
3. generate() Method: This method generates the word cloud from the input text.
4. plt.imshow(): This function from matplotlib is used to display the generated
word cloud.

Result:

Running this code will display a word cloud where words like "Python" appear larger
since they occur more frequently in the text.

Regression Plots (Using Seaborn)

Seaborn makes it easy to plot regression lines and assess relationships between
variables.
For example, you can use regression plots to visualize the relationship between Units
Sold and Sales Amount.

Regression Plot Code Example:

import seaborn as sns


import matplotlib.pyplot as plt

# Regression plot: Units Sold vs. Sales Amount


plt.figure(figsize=(10, 6))
sns.regplot(x='Units Sold', y='Sales Amount', data=df, scatter_kws={'color': 'blue'},
line_kws={'color': 'red'})
plt.title('Regression Plot: Units Sold vs Sales Amount', fontsize=14)
plt.xlabel('Units Sold')
plt.ylabel('Sales Amount')
plt.show()

Output:
Explanation:

 sns.regplot() fits a regression line and plots a scatter plot at the same time.
 The scatter_kws argument allows customization of the scatter plot (color, size,
etc.).
 The line_kws argument customizes the regression line style.

These advanced visualizations can help reveal insights from your data, especially
when exploring trends, distributions, and relationships.

Each type of plot can be customized to suit your specific analysis.

Creating Maps:

To create maps and visualize geospatial data in Python, you can use various libraries

such as folium,
geopandas,
plotly, and
cartopy.

Each has its own strengths, such as interactivity, customization, and ease of use for
geospatial operations.

Below, shows how to

1. Create Maps
2. Add Markers to Maps
3. Visualize Geospatial Data (such as points, polygons, etc.)
4. Example with Folium (Interactive Maps)
Using Folium to Create Interactive Maps and Add Markers

Folium is an excellent choice for creating interactive maps. It is simple and


provides many functionalities such as adding markers, popups, and polygons to the
map.

Step 1. Install Folium (if not already installed)

If you're using Google Colab, you can install Folium by running:

!pip install folium

For local Python environments, you can install Folium using:

pip install folium


Step 2 : Create a Simple Map with Markers

import folium
# Create a map centered around a specific location (latitude and longitude)
# For example, let's center it around New York City (latitude: 40.7128, longitude: -
74.0060)
map_center = [40.7128, -74.0060] #new York city coordinators

# create a map and set initial zoom level


mymap = folium.Map(location=map_center, zoom_start=12)

# Add a marker to the map (for New York City)


folium.Marker([40.7128, -74.0060], popup="New York City").add_to(mymap)

# Display the map


mymap
Explanation:

1. Create a Map: The folium.Map() function is used to create a map. You specify the
center of the map using latitude and longitude (location=[40.7128, -74.0060] for
New York City).
2. Zoom Level: The zoom_start=12 argument sets the zoom level of the map when
it first loads.
3. Marker: A marker is added at the location of New York City, with a pop-up text
of "New York City".
4. Display the Map: In Jupyter or Google Colab, the mymap object will display the
interactive map.

Output:

The code will generate an interactive map centered on New York City, and when
you click on the marker, a popup will appear with the text "New York City."

Note: In Colab, the map will render directly in the notebook without needing plt.show().
If you're using Jupyter Notebook, just displaying the mymap object will show the map as
well.
Example: Creating an Interactive Map with Markers

import folium

# Latitude and Longitude for New York City, Los Angeles, and Chicago
nyc = [40.7128, -74.0060] # New York City
la = [34.0522, -118.2437] # Los Angeles
chicago = [41.8781, -87.6298] # Chicago
london = [51.5074, -0.1278] # London

# Center the map between the cities (average of latitudes and longitudes)
map_center = [ (nyc[0] + la[0] + chicago[0] + london[0]) / 4,
(nyc[1] + la[1] + chicago[1] + london[1]) / 4 ]

# Create the map with a zoom level that covers all locations
mymap = folium.Map(location=map_center, zoom_start=2)
# Adjust zoom_start to fit all cities

# Add markers for each city


folium.Marker(nyc, popup="New York City").add_to(mymap)
folium.Marker(la, popup="Los Angeles").add_to(mymap)
folium.Marker(chicago, popup="Chicago").add_to(mymap)
folium.Marker(london, popup="London",
icon=folium.Icon(color='green')).add_to(mymap)

# Save the map as an HTML file and display it


mymap.save("city_map.html")

# Display the map in Jupyter Notebook (optional)


mymap

After running the code, the map is saved as city_map.html. You can open this
HTML file in a browser to see the map with interactive features like zooming and
panning.
In this example:

 Map center: We set the map's initial view to New York City (latitude: 40.7128,
longitude: -74.0060).
 Markers: We added markers for New York, Los Angeles, and Chicago with
popups displaying the city names.
 Custom Icons: A marker for London is added with a custom green icon.

Map with Different Icons and Customization

You can also customize the map with different icons for the markers, which can
represent different types of locations (e.g., parks, restaurants, landmarks, etc.).

import folium
# Create a map centered at New York City
mymap = folium.Map(location=[40.7128, -74.0060], zoom_start=12)

# Adding a custom marker with a different icon


folium.Marker(
[40.7128, -74.0060],
popup="New York City",
icon=folium.Icon(color='blue', icon='cloud')).add_to(mymap)

# Adding another marker with a custom icon


folium.Marker(
[34.0522, -118.2437],
popup="Los Angeles",
icon=folium.Icon(color='red', icon='info-sign')).add_to(mymap)

# Save the map as an HTML file


mymap.save("custom_city_map.html")

This will create a map with two markers—one for New York City and another for Los
Angeles—with custom icons and different colors.
Visualizing Geospatial Data using Folium in Python
One of the most important tasks for someone working on datasets with
countries, cities, etc. is to understand the relationships between their data’s
physical location and their geographical context. And one such way to
visualize the data is using Folium.

Folium is a powerful data visualization library in Python that was built


primarily to help people visualize geospatial data. With Folium, one can create
a map of any location in the world. Folium is actually a python wrapper for
leaflet.js which is a javascript library for plotting interactive maps.

Using folium.Map(), we will create a base map and store it in an object. This
function takes location coordinates and zoom values as arguments.

syntax: folium.Map(location,tiles= “OpenStreetMap” zoom_start=4)

Parameters:
 location: list of location coordinates
 tiles: default is OpenStreetMap. Other options: tamen Terrain, Stamen
Toner, Mapbox Bright etc.
 zoom_start: int

 # import the folium, pandas libraries


 import folium
 import pandas as pd

 # initialize the map and store it in a m object
 m = folium.Map(location = [40, -95],
 zoom_start = 4)

 # show the map
 m.save('my_map.html')
 m
2. Visualizing Geospatial Data with Folium: Heatmaps

One powerful visualization is creating heatmaps to show the density of


geospatial points. Here's how to do it using folium and folium.plugins.HeatMap.

Example: Creating a Heatmap on a Map


import folium
from folium.plugins import HeatMap

# Sample data: List of coordinates (latitude, longitude)


data = [
[40.7128, -74.0060], # New York City
[34.0522, -118.2437], # Los Angeles
[41.8781, -87.6298], # Chicago
[51.5074, -0.1278], # London
[48.8566, 2.3522], # Paris
[40.730610, -73.935242], # Brooklyn
[37.7749, -122.4194], # San Francisco
[34.0522, -118.2437], # Los Angeles
]
# Create a map centered at an average location
map_center = [39.8283, -98.5795] # Roughly the center of the US
mymap = folium.Map(location=map_center, zoom_start=3)

# Add heatmap to the map


HeatMap(data).add_to(mymap)

# Save the map as an HTML file


mymap.save("heatmap.html")

output:

In this example:

 Heatmap: Points are used to create a heatmap visualization, where areas with a
higher concentration of points are highlighted.
 Folium Plugins: HeatMap is used from folium.plugins to create the heatmap.
3. Using Geopandas for Geospatial Data Analysis

GeoPandas is a library for geospatial data analysis that builds on Pandas and allows
you to read, manipulate, and plot geospatial data formats like shapefiles, GeoJSON, and
more.

Install geopandas:
pip install geopandas

Example: Plotting a Map of US States with GeoPandas


import geopandas as gpd
import matplotlib.pyplot as plt

# Load a built-in shapefile from GeoPandas (low resolution world map)


world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Filter for the United States (from the world shapefile)


usa = world[world['name'] == 'United States of America']

# Plot the USA map


usa.plot(figsize=(10, 6))
plt.title("United States Map")
plt.show()

In this example:

 GeoPandas: The naturalearth_lowres dataset is used, which contains a global


map with country polygons.
 Shapefiles/GeoJSON: You can load and plot shapefiles or GeoJSON files directly.
4. Visualizing Geospatial Data with Plotly

Plotly can be used to create interactive visualizations for geospatial data. Plotly
provides different types of maps such as scatter_geo and choropleth maps.

Install plotly:
pip install plotly
Example: Creating an Interactive Map with Plotly
import plotly.express as px

# Sample data with latitude, longitude, and population


data = {
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Latitude': [40.7128, 34.0522, 41.8781, 29.7604, 33.4484],
'Longitude': [-74.0060, -118.2437, -87.6298, -95.3698, -112.0740],
'Population': [8175133, 3792621, 2695598, 2129784, 1660272]
}

# Convert to DataFrame
import pandas as pd
df = pd.DataFrame(data)

# Plot interactive map with Plotly


fig = px.scatter_geo(df, lat='Latitude', lon='Longitude', text='City', size='Population',
title="US Cities by Population", hover_name="City")

fig.update_geos(showland=True, landcolor="lightgray", projection_type="mercator")


fig.show()

Output:
 Plotly Express: This code creates an interactive map that visualizes cities in the
U.S. by their population.
 Interactive Features: You can zoom in, zoom out, and hover over points to see
more information.

5. Using Cartopy for Static Maps

Cartopy is another library used to create static maps, especially for scientific
applications, and it can be paired with matplotlib for customization.

Install Cartopy:
pip install cartopy
Example: Creating a Simple Map with Cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
# Create a plot with a specific projection
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(111, projection=ccrs.PlateCarree()) # PlateCarree projection
# Add coastlines and country borders
ax.coastlines()
ax.add_feature(cfeature.BORDERS)
# Add a title
plt.title('World Map using Cartopy')
# Display the map
plt.show()

Cartopy allows for advanced map projections and the addition of various map
features (e.g., coastlines, borders).

Key notes:

1. Create interactive maps using Folium, including adding markers and


visualizing heatmaps.
2. Visualize geospatial data using GeoPandas for simple static maps.
3. Create interactive maps using Plotly for data visualization.
4. Create scientific static maps using Cartopy for geospatial analysis.

Choropleth maps are useful for visualizing the intensity of a variable across different
geographic regions. These maps use different colors to represent different data values,
and they are commonly used to visualize things like population density, election results,
or sales data across regions.

1. Plotly: Choropleth Map of US States by Population

This example shows how to create a choropleth map for US states based on population
using Plotly.

Example: Choropleth Map for US States by Population


import plotly.express as px
import pandas as pd

# Sample data: Population for each US state


data = {
'State': ['California', 'Texas', 'Florida', 'New York', 'Pennsylvania', 'Illinois', 'Ohio',
'Georgia', 'North Carolina', 'Michigan'],
'Population': [39538223, 29145505, 21538187, 20201249, 13002700, 12671821,
11689100, 10519475, 10439388, 9986857],
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Create the choropleth map


fig = px.choropleth(df,
locations='State',
locationmode='USA-states',
color='Population',
hover_name='State',
title="US States by Population",
color_continuous_scale="Viridis")

# Show the map


fig.show()

Explanation:

 Plotly allows easy creation of choropleth maps by using px.choropleth(). The


locations parameter is used to define which geographic regions (in this case, US
states) should be visualized.
 The map colors regions based on the Population column from the data.

Output:

An interactive choropleth map will be displayed where each state is colored based on its
population. Hovering over each state shows the population of that state.

2. Folium: Choropleth Map with GeoJSON Data (Custom Boundaries)

Folium can also be used to create choropleth maps using GeoJSON data. This method
allows for more complex boundaries, such as counties, countries, or custom regions.

Example: Choropleth Map for US States using GeoJSON Data

import folium
import requests

# Fetch GeoJSON data for US states (you can replace this with your own GeoJSON file)
url =
'https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/d
ata/us-states.geojson'
response = requests.get(url)
geojson_data = response.json()
# Sample data for state populations (You can replace this with real data)
state_data = {
'California': 39538223,
'Texas': 29145505,
'Florida': 21538187,
'New York': 20201249,
'Pennsylvania': 13002700,
'Illinois': 12671821,
'Ohio': 11689100,
'Georgia': 10519475,
'North Carolina': 10439388,
'Michigan': 9986857,
}

# Create a Folium map centered around the US


mymap = folium.Map(location=[37.0902, -95.7129], zoom_start=5)

# Add the choropleth layer to the map


folium.Choropleth(
geo_data=geojson_data,
data=state_data,
columns=['State', 'Population'],
key_on='feature.properties.name', # This specifies the name of the region in the
GeoJSON file
fill_color='YlGn', # Color scheme
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Population',
).add_to(mymap)

# Save the map as an HTML file


mymap.save("folium_choropleth_map.html")
# Display the map in Jupyter Notebook (if using Jupyter)
mymap

Explanation:

 We use GeoJSON data for US states, and Folium creates a choropleth map using
the Choropleth class.
 The state_data dictionary contains population data for each state. The
key_on='feature.properties.name' argument specifies that the GeoJSON file's
name field is used to match the population data.
 The color is determined by the Population values, and you can customize the
color scale using the fill_color argument.

Output:

An interactive choropleth map will be displayed with US states colored based on their
population. You can zoom in, hover over each state, and see the population data.

3. Folium: Choropleth Map with Custom Boundaries (Country Level)

You can also create choropleth maps for countries using GeoJSON files that include
country boundaries.

Example: Choropleth Map for World Countries (Using World Bank Data)
import folium
import requests

# Fetch GeoJSON data for countries in the world


url =
'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
response = requests.get(url)
geojson_data = response.json()
# Sample data: GDP of countries (You can replace this with real data)
country_data = {
'United States': 21137518,
'China': 14140163,
'Japan': 5081770,
'Germany': 3845630,
'India': 2713080,
'United Kingdom': 2715135,
'France': 2715518,
'Italy': 2001934,
'Canada': 1647126,
'South Korea': 1639534,
}
# Create a map centered around the world
mymap = folium.Map(location=[20, 0], zoom_start=2)

# Add the choropleth layer to the map for GDP


folium.Choropleth(
geo_data=geojson_data,
data=country_data,
columns=['Country', 'GDP'],
key_on='feature.properties.name', # Name field in GeoJSON file
fill_color='YlOrRd', # Color scheme
fill_opacity=0.7,
line_opacity=0.2,
legend_name='GDP in Billion USD',
).add_to(mymap)

# Save the map as an HTML file


mymap.save("world_choropleth_map.html")

# Display the map in Jupyter Notebook (if using Jupyter)


mymap
Explanation:

 We use GeoJSON data for the countries of the world.


 The country_data dictionary contains the GDP values for a selection of
countries.
 Folium's Choropleth method is used to color countries based on their GDP.

Output:

An interactive choropleth map will be displayed showing countries colored according to


their GDP. Zoom in and hover over any country to see the specific GDP value.

4. Plotly: Choropleth Map for Global Regions (Interactive)

Plotly allows you to create highly interactive choropleth maps for global regions as well.
The data can be anything you want to visualize—population, GDP, etc.

Example: Global Population Choropleth Map


import plotly.express as px
import pandas as pd

# Example data: Population of countries (You can replace this with real data)
data = {
'Country': ['United States', 'China', 'India', 'Indonesia', 'Pakistan', 'Brazil', 'Nigeria',
'Bangladesh', 'Russia', 'Mexico'],
'Population': [331002651, 1439323776, 1380004385, 273523615, 220892340,
212559417, 206139589, 164689383, 145912025, 128933395],
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Plot choropleth map


fig = px.choropleth(df,
locations='Country',
locationmode='country names',
color='Population',
hover_name='Country',
color_continuous_scale='Blues',
title="Global Population by Country")

# Show map
fig.show()

Explanation:

 This Plotly example visualizes the global population by country using a


choropleth map.
 The countries are colored based on their population, with a continuous color
scale (Blues in this case).

Choropleth maps are an excellent way to visualize geographic data in Python, and you
can create them using different libraries such as Plotly and Folium.

Waffle Charts and Word Clouds with simple use cases, including multiple variations
for each. These are great for visualizing categorical data or text data respectively.
1. Waffle Chart

A Waffle Chart is a 10x10 grid (100 cells), where each cell represents a percentage of a
category. It's a great alternative to pie charts when you want a more visual, grid-based
display of proportions.

Example 1: Basic Waffle Chart for Sales Distribution by Region


import matplotlib.pyplot as plt
import numpy as np

# Sales distribution by region


regions = ['North', 'South', 'East', 'West']
values = [40, 30, 20, 10]

# Normalize values to fit into 100 cells


total_cells = 100
percentages = [v / sum(values) * total_cells for v in values]

# Create a grid for the waffle chart (10x10 grid)


waffle_data = np.zeros((10, 10))

# Fill the grid with the regions based on the percentage of sales
start = 0
colors = plt.cm.Paired.colors # Using color palette for regions
for i, percentage in enumerate(percentages):
end = int(start + percentage)
color = colors[i % len(colors)]
waffle_data.ravel()[start:end] = i + 1 # Fill cells for the region
start = end

# Plot the grid


fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111)
for i in range(10):
for j in range(10):
ax.add_patch(plt.Rectangle((j, 9-i), 1, 1, color=colors[int(waffle_data[i, j]) - 1]))

# Remove axis and add title


ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
plt.title('Sales Distribution Across Regions (Waffle Chart)', fontsize=14)
plt.show()

Explanation:

 We have a simple sales distribution by region. The waffle chart is represented by


a 10x10 grid (100 cells) where each region’s sales are mapped proportionally.
 The colors differentiate each region.

Example 2: Waffle Chart with Multiple Categories (Sales, Marketing, and Development)
import matplotlib.pyplot as plt
import numpy as np

# Data for different departments


departments = ['Sales', 'Marketing', 'Development']
values = [40, 35, 25]

# Normalize values to fit into 100 cells


total_cells = 100
percentages = [v / sum(values) * total_cells for v in values]

# Create a grid for the waffle chart (10x10 grid)


waffle_data = np.zeros((10, 10))
# Fill the grid with the categories based on the percentage
start = 0
colors = ['#ff9999', '#66b3ff', '#99ff99'] # Custom colors
for i, percentage in enumerate(percentages):
end = int(start + percentage)
color = colors[i % len(colors)]
waffle_data.ravel()[start:end] = i + 1 # Fill cells for the department
start = end

# Plot the grid


fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111)

for i in range(10):
for j in range(10):
ax.add_patch(plt.Rectangle((j, 9-i), 1, 1, color=colors[int(waffle_data[i, j]) - 1]))

# Remove axis and add title


ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
plt.title('Budget Distribution by Department (Waffle Chart)', fontsize=14)
plt.show()

Explanation:

 Here, the waffle chart is used to represent budget distribution across


departments (Sales, Marketing, and Development), with different colors
representing each department.

2. Word Cloud
A Word Cloud is a visual representation of text data, where the size of each word
indicates its frequency or importance in the dataset.

Example 1: Basic Word Cloud from Text Data


from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Sample text data


text = "Data science is an inter-disciplinary field that uses scientific methods, processes,
algorithms and systems to extract knowledge and insights from structured and
unstructured data."

# Generate a word cloud


wordcloud = WordCloud(width=800, height=400,
background_color='white').generate(text)

# Display the word cloud


plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Basic Word Cloud', fontsize=16)
plt.show()

Explanation:

 In this example, a word cloud is generated from a simple text snippet, with the
size of each word indicating its frequency.

Example 2: Word Cloud with Custom Settings (Frequent Words from a Review Dataset)
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Sample text data from reviews
reviews = """
Great product! Fast delivery and excellent customer service. I will definitely buy again.
The quality of the product was better than expected. Fast shipping and great customer
support.
I am very happy with my purchase. Excellent quality and fast delivery. Highly
recommend.
"""

# Generate word cloud with custom settings


wordcloud = WordCloud(
width=800,
height=400,
background_color='black',
stopwords=["and", "the", "is", "with", "I", "to", "a"],
colormap='Blues'
).generate(reviews)

# Display the word cloud


plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Product Reviews Word Cloud', fontsize=16)
plt.show()

Explanation:

 This word cloud is created from a set of product reviews, with common
stopwords removed to highlight more meaningful words. The background color
is set to black, and the colormap is set to 'Blues' for a cooler tone.
Example 3: Word Cloud with Image Mask

A word cloud can also be shaped according to a custom image (e.g., a logo or any other
image shape).

from wordcloud import WordCloud


import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

# Load a mask image (e.g., a star-shaped image)


mask_image = np.array(Image.open("star_shape.png")) # Replace with a valid image
path

# Sample text data


text = "Python is great for data science. Python is great for machine learning. Python is
great for analysis."

# Generate word cloud with custom mask


wordcloud = WordCloud(
mask=mask_image,
contour_width=1,
contour_color='black',
background_color='white'
).generate(text)

# Display the word cloud


plt.figure(figsize=(10, 10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud with Image Mask', fontsize=16)
plt.show()

Explanation:
 In this case, the word cloud is shaped according to a custom image (e.g., a star).
You would need an image file for the mask, and the text is visualized in the shape
of that image.

You might also like