0% found this document useful (0 votes)
59 views15 pages

Description of Data Visualization Tools

machine learning technics

Uploaded by

gateracalvin.c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views15 pages

Description of Data Visualization Tools

machine learning technics

Uploaded by

gateracalvin.c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1.4.

INTERPRET DATA VISUALIZATION


It involves analyzing visual representations of data to extract meaningful insights.

Description of data Visualization tools


• Matplotlib: It is a popular Python library for creating static, interactive, and animated
visualizations. It provides a flexible way to generate plots and charts, and it’s widely used
for data analysis and visualization tasks.

Matplotlib supports various types of plots, including:


 Line Plot: plt.plot(x, y)
 Scatter Plot: plt.scatter(x, y)
 Bar Chart: plt.bar(x, y)
 Histogram: plt.hist(data)
 Pie Chart: plt.pie(sizes, labels=labels)
 Box Plot: plt.boxplot(data)

Description of Matplotlib figure

Figure 1: parts of matplotlib figure

• Seaborn: is a library mostly used for statistical plotting in Python. It is built based on
Matplotlib and provides beautiful default styles and color palettes to make statistical
plots more attractive.
• Plotly: plotly enables Python users to create beautiful interactive web-based
visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files,
or served as part of pure Python-built web applications

• Tableau: is a data visualization tool that helps users analyze data and gain insights through
interactive charts and graphs.

It helps create interactive graphs and charts in dashboards and worksheets to gain business
insights.

It connects easily to any data source like Microsoft Excel, corporate data warehouse, or web-
based data.

• Power BI: Microsoft Power business intelligence (BI) is a business analytics tool that gives
users the ability for creating dashboards, reports, aggregating, analyzing, visualizing and
sharing data.

Different types of Power BI charts and visualizations are: Bar charts, Line charts, Combo
charts, Doughnut charts, Ribbon charts, Waterfall charts, Scatter diagrams, Matrix, Pie
charts,
Use Types of Data Visualization
Let’s use matplotlib to plot various charts

• Step1: Installation of Matplotlib:

C:\Users\Your Name>pip install matplotlib (in python)

C:\Users\Your Name>conda install matplotlib (in Anaconda)

• Step2: Import Matplotlib in the application

Most of the Matplotlib utilities lies under the pyplot submodule, and are imported under the plt

• import matplotlib

• import matplotlib.pyplot as plt

Line Chart
Line chart is one of the basic plots and can be created using the plot () function. It is used to
represent a relationship between two data X and Y on a different axis.

Example1: Draw two points in the diagram, one Result:


at position (1, 3) and one in position (8, 10):
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([4, 8])
ypoints = np.array([2.5, 5])
plt.plot(xpoints, ypoints, 'o')
plt.show()
Example2: Draw a line in a diagram from position Result:
(1,2) to position (8,10):

import matplotlib.pyplot as plt


import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints)
plt.show()

Example3: Draw a line in a diagram from position


(10,20),(20,25), (30,35), (40,55):
Answer:
import matplotlib.pyplot as plt
# initializing the data
x = [10, 20, 30, 40]
y = [20, 25, 35, 55]
# plotting the data
plt.plot(x, y)
#plt.plot(x, y, color='green', linewidth=3, marker='o',
markersize=15, linestyle='--')
# Adding title to the plot
plt.title("Line Chart")
# Adding label on the y-axis
plt.ylabel('Y-Axis')
# Adding label on the x-axis
plt.xlabel('X-Axis')
plt.show()

Creating Scatter Plot


It is a type of data visualization that uses dots to represent values for two different variables.
Each dot on the plot represents a single data point, with its position determined by the values of
the two variables. Brief, A scatter plot is a diagram where each value in the data set is represented
by a dot.

Interpret and use a scatter plot:

− Axies: The x-axis represents one variable, and the y-axis represents the other.
− Pattern Recognition: By looking at the arrangement of dots, you can identify patterns or
trends.

− Correlation: Scatter plots can help determine the correlation between variables.

− Outliers: Points that are far away from the general cluster of data can be identified as outliers.

Use cases examples include but not limited to:

Correlation Between Study Hours and Exam Scores, Relationship Between Age and Income,
Height vs Weight, City Population vs Crime Rate, Temperature vs Ice Cream Sales, Satisfaction
vs Salary, Education Level vs Job Performance…

With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis.

Example1: A simple scatter plot: Result


import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()

Example2: Draw two plots on the same figure: Result


import matplotlib.pyplot as plt
import numpy as np
#day one, the age and speed of 13 cars:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)

#day two, the age and speed of 15 cars:


x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y =np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y)
plt.show()
Example3: create the scatter Plot of the Result
hours studied versus the exam scores
based on give data. Also, include the x y
axis labels and plot title.
import matplotlib.pyplot as plt
import numpy as np
Hours = ([1,2,3,4,5])
Scores = ([50,55,60,65,70])
plt.scatter(Hours, Scores)
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.title('Hours Studied vs Exam Score')
plt.show()

Creating Bar Charts


Bar charts are a versatile and widely used tool for visualizing categorical data and comparing
different groups or categories.

Example Use Cases: Sales by Region, Survey Results by Age Group, Monthly Revenue,

With Pyplot, you can use the bar() function to draw bar graphs. If you want the bars to be
displayed horizontally instead of vertically, use the barh() function:

The categories and their values represented by the first and second argument as arrays.

Example1: create the bar chart about the regions Result


with respect to sales performed by a vendor.

import matplotlib.pyplot as plt

# Sample Data
Region = ['North', 'South', 'East', 'West']
values = [200, 150, 300, 250]

# Create Bar Chart


plt.bar(Region, values, color='LightGreen')
plt.xlabel('Regions')
plt.ylabel('Sales')
plt.title('Sales by Region')
plt.show()
Example2: Draw 4 horizontal bars Result

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.barh(x, y, color='orange')
plt.xlabel('Letter')
plt.ylabel('Quantity')
plt.title('Quantity of Leters')
plt.show()

Creating Histograms
A histogram is a type of bar chart that displays the distribution of a set of continuous or numerical
data. It’s used to visualize the frequency distribution of data points within different intervals or
bins.

The x-axis of the graph represents the class interval, and the y-axis shows the various frequencies
corresponding to different class intervals. There are no gaps between two consecutive rectangles
based on the fact that histograms can be drawn when data are in the form of the frequency
distribution of a continuous series

Example Use Cases: Distribution of Exam Scores, Age Distribution of Survey Respondents, Height
Distribution of a Population…

The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.

Creating a Histogram with Equal Class Intervals

Example1: With the dataset of exam scores: [55, Results


62, 68, 70, 75, 80, 85, 90, 95, 100], create a
histogram.

import matplotlib.pyplot as plt

# Sample Data
data = [55, 62, 68, 70, 75, 80, 85, 90, 95, 100]

# Create Histogram with equal class intervals


plt.hist(data, bins=5, range=(55, 105),
color='skyblue', edgecolor='black')
plt.xlabel('Score Range')
plt.ylabel('Frequency')
plt.title('Histogram of Exam Scores with Equal Class
Intervals')
plt.show()

Creating the Histogram with Unequal Class Intervals


Example1: With the dataset of exam scores: [45, Results
50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100], create
a histogram.

import matplotlib.pyplot as plt


import numpy as np

# Sample Data
data = [45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]

# Define the custom bin edges


Bins = [45, 50, 60, 70, 80, 90, 100]

# Create Histogram
plt.hist(data, bins=Bins, edgecolor=red, color=grey)
plt.xlabel('Score Range')
plt.ylabel('Frequency')
plt.title('Histogram of Exam Scores with Unequal
Class Intervals')
plt.show()

Working with Seaborn


Seaborn is a powerful Python data visualization library built on top of Matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics. By combining different plots,
customizing styles, and utilizing the built-in datasets, you can effectively communicate insights from your
data.

1. Installation and Import:

 Ensure Seaborn is installed: pip install seaborn


 Import necessary libraries:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
 Load your data: Read your data into a Pandas DataFrame:
# Create sample data
data = {"x": [1, 2, 3, 4, 5], "y": [2, 4, 5, 3, 6]}
df = pd.DataFrame(data)
# Assuming you have a CSV file named 'data.csv'
data = pd.read_csv('your_data.csv')
# Load the iris dataset
iris = sns.load_dataset("iris")

 Choose a visualization type: Based on your data and the questions you
want to answer, select a suitable visualization function.
sns.lineplot(x='x_column', y='y_column', data=df)
sns.scatterplot(x='x_column', y='y_column', data=df)
sns.barplot(x='categorical_column', y='numerical_column', data=df)
sns.histplot(df['numerical_column'], bins=30)
sns.boxplot(x='categorical_column', y='numerical_column', data=df)

 Customize your plot: Use Seaborn's options to customize the appearance


of your plot, such as colors, labels, and titles.
Color palettes: sns.color_palette()
Styles: sns.set_style()
Themes: sns.set_theme()
Hue: hue parameter in many plots
Legends: plt.legend()
 Display the plot: Use plt.show() to display the plot.
 Save Your Visualizations:
plt.savefig("visualization.png")

seaborn Function Classifications


relplot() displot() catplot()

scatterplot() histplot() stripplot()

lineplot() kdeplot() swarmplot()

Relational Plots ecdfplot() boxplot()

rugplot() violinplot()

Distribution Plots boxenplot()

barplot()

pointplot()

countplot()

Categorical Plots
Using scatter function with seaborn:

import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd
tips = sns.load_dataset('tips')
sns.scatterplot(data=tips, x='total_bill', y='tip',
hue='day')
plt.title('Total Bill vs Tip')
plt.show()
Using line function with seaborn:

import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd
tips = sns.load_dataset('tips')
sns.lineplot(data=tips, x='size', y='total_bill',
estimator='mean')
plt.title('Average Total Bill by Size')
plt.show()

Using combined functions with seaborn:

import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd
tips = sns.load_dataset('tips')
plt.figure(figsize=(12, 6))
# Scatter plot
sns.scatterplot(data=tips, x='total_bill', y='tip',
hue='time', alpha=0.7)
# Regression line
sns.regplot(data=tips, x='total_bill', y='tip',
scatter=False, color='red')
plt.title('Total Bill vs Tip with Regression Line')
plt.show()
Using histogram function with seaborn:

import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins,
x="flipper_length_mm", hue="species",
multiple="stack")
plt.show()

Creating a Box Plot


The box plot (or box-and-whisker plot) is a data visualization tool that provides a concise overview
of data distribution, from central tendencies to potential outliers. It is a standardized way of
displaying the distribution of a dataset, showing its central tendency, variability, and potential
outliers.

Install the seaborn library: C:\Users\Your Name> pip install matplotlib seaborn (in python)

Example of (ordered) dataset: [5, 7, 8, 12, 13, 14, 15, 18, 20, 22]

 Median: The middle value of the dataset when it is ordered (ascending order).

Find median: Median = (13 + 14) / 2 = 13.5

 Quartiles: These divide the data into four equal parts:

o Q1 (First Quartile): The 25th percentile, or the median of the lower half of the
dataset. For this dataset, it’s the median of 5, 7, 8, 12, 13, which is 8.

o Q3 (Third Quartile): The 75th percentile, or the median of the upper half of the
dataset. For this dataset, it’s the median of 14, 15, 18, 20, 22, which is 18.

 Interquartile Range (IQR): The range between Q1 and Q3 (Q3 - Q1), showing the spread
of the middle 50% of the data. IQR = Q3 - Q1 = 18 - 8 = 10

 Whiskers: Lines extending from the quartiles to the minimum and maximum values within
1.5 * IQR from Q1 and Q3.
Lower Whisker = Q1 - 1.5 * IQR = 8 - 1.5 * 10 = -7 (is < 5) i.e consider lowest value 5

Upper Whisker = Q3 + 1.5 * IQR = 18 + 1.5 * 10 = 33 (is > 22) i.e. consider highest value 22
 Outliers: Individual points plotted separately from the whiskers. Data points that fall
outside the range defined by the whiskers.

Example1: With the following dataset: [5, 7, 8, 12, Results:


13, 14, 15, 18, 20, 22], create a box plot.

import matplotlib.pyplot as plt


import seaborn as sns

# Sample data
Data = [5, 7, 8, 12, 13, 14, 15, 18, 20, 22]

# Create a box plot


sns.boxplot(data=Data)

# Display the plot


plt.show()

Creating Heat map


A heatmap (aka heat map) depicts values for a main variable of interest across two axis
variables as a grid of colored squares. A heat map is a data visualization tool that uses color to
represent the intensity or magnitude of values in a matrix or grid. It's often used to display data
where the values are spatially distributed or to show how a variable changes across different
conditions.
Use cases of Heat Maps:
o Business Analytics: To visualize data like customer activity, sales
performance, or market trends.
o Geographic Data: For showing geographic distribution of data points, such
as crime rates in different areas or population density.
o Website Analytics: To understand user behavior, such as which areas of a
webpage receive the most clicks or attention.
o Scientific Research: For displaying complex data, such as gene expression
levels in bioinformatics or patterns in physical experiments.
o Sports Analytics: To visualize player performance, such as where a player
spends most of their time on the field or court.
o Social Media: To analyze trends and interactions, like the frequency of posts
or sentiments across different regions or time periods.
Python offers powerful libraries for creating heat maps, such as matplotlib, seaborn, or plotly.
Install libraries in python: C:\Users\Your Name> pip install matplotlib seaborn pandas

Example1: plot the given data using heatmap. Results

import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd

# Example data
data = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [2, 3, 4, 5],
'C': [3, 4, 5, 6] })

# Create heat map


sns.heatmap(data, annot=True, cmap='YlGnBu')

# Show the plot


plt.show()

Example2: Show the distribution of data across Results


geographic locations.

import plotly.express as px
import pandas as pd

# Sample data: Geographic heatmap data


data = pd.DataFrame({
'Latitude': [37.77, 40.71, 34.05, 41.87, 47.61],
'Longitude': [-122.42, -74.01, -118.24, -87.63, -122.33],
'Value': [100, 200, 150, 300, 250] })

# Create a heatmap
fig = px.density_mapbox(data, lat='Latitude', lon='Longitude',
z='Value', radius=10, center=dict(lat=37.77, lon=-122.42),
zoom=3, mapbox_style="stamen-toner")

# Show the plot


fig.update_layout(title='Geographical Heat Map')
fig.show()

Example3: Show activity levels over time (e.g., Results


hourly activity over days of the week).
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Sample data: Create a matrix of activity levels


data = np.random.rand(7, 24) # 7 days, 24 hours

# Create a heatmap
sns.heatmap(data, cmap='YlGnBu', cbar=True)

# Customize labels
plt.title('Hourly Activity Over Days of the Week')
plt.xlabel('Hour of the Day')
plt.ylabel('Day of the Week')
plt.xticks(ticks=np.arange(24) + 0.5, labels=[f'{h}:00' for h
in range(24)])
plt.yticks(ticks=np.arange(7) + 0.5, labels=['Mon', 'Tue',
'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])

# Show the plot


plt.show()

1.5. Applying data Visualization Best Practices

Effective data visualization is essential for communicating insights clearly and compellingly. By
adhering to certain best practices, you can create visualizations that are both visually appealing
and informative. It allows us to quickly interpret information and make decisions based on
complex data sets. However, the effectiveness of data visualization depends significantly on
how it is executed.
1. Know your audience
It should be compatible with the audience’s expertise and allow viewers
to view and process data easily and quickly.
2. Choose the Right Type of Visualization (Chart Type)
Common types include bar charts for comparisons (categorical), line
graphs for trends over time, scatter plots for relationships.
3. Keep the visualization Simple
Avoid clutter by removing unnecessary elements. Focus on the key
message you want to convey. Use clear labels, titles, and legends.
4. Use Clear and Consistent Labeling:
Label axes and data points: Clearly indicate what each axis represents
and the values of data points.
Use consistent units: Ensure that units are consistent throughout the
visualization.
5. Choose Appropriate Colors:
Use a color palette that is visually appealing: Avoid overly bright or
contrasting colors that can be difficult to read.
Consider color blindness: Choose colors that are easily distinguishable
for people with color vision deficiencies.
6. Provide Context
Include context by adding annotations, reference lines, or background
information to help the audience interpret the data.
7. Use Consistent Scales
When comparing multiple visualizations, use the same scales and units
to prevent misinterpretation.
8. Highlight Key Insights
Draw attention to the most important data points or trends using
emphasis techniques like bold text, larger sizes, or contrasting colors.
9. Tell a Story
Organize your visualizations in a logical sequence: Guide the viewer
through the key points of your story.
Use annotations and explanations: Provide context and additional
information as needed.

You might also like