Description of Data Visualization Tools
Description of Data Visualization Tools
• Seaborn: is a library mostly used for statistical plotting in Python. It is built based on
Matplotlib and provides beautiful default styles and color palettes to make statistical
plots more attractive.
• Plotly: plotly enables Python users to create beautiful interactive web-based
visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files,
or served as part of pure Python-built web applications
• Tableau: is a data visualization tool that helps users analyze data and gain insights through
interactive charts and graphs.
It helps create interactive graphs and charts in dashboards and worksheets to gain business
insights.
It connects easily to any data source like Microsoft Excel, corporate data warehouse, or web-
based data.
• Power BI: Microsoft Power business intelligence (BI) is a business analytics tool that gives
users the ability for creating dashboards, reports, aggregating, analyzing, visualizing and
sharing data.
Different types of Power BI charts and visualizations are: Bar charts, Line charts, Combo
charts, Doughnut charts, Ribbon charts, Waterfall charts, Scatter diagrams, Matrix, Pie
charts,
Use Types of Data Visualization
Let’s use matplotlib to plot various charts
Most of the Matplotlib utilities lies under the pyplot submodule, and are imported under the plt
• import matplotlib
Line Chart
Line chart is one of the basic plots and can be created using the plot () function. It is used to
represent a relationship between two data X and Y on a different axis.
− Axies: The x-axis represents one variable, and the y-axis represents the other.
− Pattern Recognition: By looking at the arrangement of dots, you can identify patterns or
trends.
− Correlation: Scatter plots can help determine the correlation between variables.
− Outliers: Points that are far away from the general cluster of data can be identified as outliers.
Correlation Between Study Hours and Exam Scores, Relationship Between Age and Income,
Height vs Weight, City Population vs Crime Rate, Temperature vs Ice Cream Sales, Satisfaction
vs Salary, Education Level vs Job Performance…
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays of the same length,
one for the values of the x-axis, and one for values on the y-axis.
Example Use Cases: Sales by Region, Survey Results by Age Group, Monthly Revenue,
With Pyplot, you can use the bar() function to draw bar graphs. If you want the bars to be
displayed horizontally instead of vertically, use the barh() function:
The categories and their values represented by the first and second argument as arrays.
# Sample Data
Region = ['North', 'South', 'East', 'West']
values = [200, 150, 300, 250]
plt.barh(x, y, color='orange')
plt.xlabel('Letter')
plt.ylabel('Quantity')
plt.title('Quantity of Leters')
plt.show()
Creating Histograms
A histogram is a type of bar chart that displays the distribution of a set of continuous or numerical
data. It’s used to visualize the frequency distribution of data points within different intervals or
bins.
The x-axis of the graph represents the class interval, and the y-axis shows the various frequencies
corresponding to different class intervals. There are no gaps between two consecutive rectangles
based on the fact that histograms can be drawn when data are in the form of the frequency
distribution of a continuous series
Example Use Cases: Distribution of Exam Scores, Age Distribution of Survey Respondents, Height
Distribution of a Population…
The hist() function will use an array of numbers to create a histogram, the array is sent into the
function as an argument.
# Sample Data
data = [55, 62, 68, 70, 75, 80, 85, 90, 95, 100]
# Sample Data
data = [45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]
# Create Histogram
plt.hist(data, bins=Bins, edgecolor=red, color=grey)
plt.xlabel('Score Range')
plt.ylabel('Frequency')
plt.title('Histogram of Exam Scores with Unequal
Class Intervals')
plt.show()
Choose a visualization type: Based on your data and the questions you
want to answer, select a suitable visualization function.
sns.lineplot(x='x_column', y='y_column', data=df)
sns.scatterplot(x='x_column', y='y_column', data=df)
sns.barplot(x='categorical_column', y='numerical_column', data=df)
sns.histplot(df['numerical_column'], bins=30)
sns.boxplot(x='categorical_column', y='numerical_column', data=df)
rugplot() violinplot()
barplot()
pointplot()
countplot()
Categorical Plots
Using scatter function with seaborn:
Install the seaborn library: C:\Users\Your Name> pip install matplotlib seaborn (in python)
Example of (ordered) dataset: [5, 7, 8, 12, 13, 14, 15, 18, 20, 22]
Median: The middle value of the dataset when it is ordered (ascending order).
o Q1 (First Quartile): The 25th percentile, or the median of the lower half of the
dataset. For this dataset, it’s the median of 5, 7, 8, 12, 13, which is 8.
o Q3 (Third Quartile): The 75th percentile, or the median of the upper half of the
dataset. For this dataset, it’s the median of 14, 15, 18, 20, 22, which is 18.
Interquartile Range (IQR): The range between Q1 and Q3 (Q3 - Q1), showing the spread
of the middle 50% of the data. IQR = Q3 - Q1 = 18 - 8 = 10
Whiskers: Lines extending from the quartiles to the minimum and maximum values within
1.5 * IQR from Q1 and Q3.
Lower Whisker = Q1 - 1.5 * IQR = 8 - 1.5 * 10 = -7 (is < 5) i.e consider lowest value 5
Upper Whisker = Q3 + 1.5 * IQR = 18 + 1.5 * 10 = 33 (is > 22) i.e. consider highest value 22
Outliers: Individual points plotted separately from the whiskers. Data points that fall
outside the range defined by the whiskers.
# Sample data
Data = [5, 7, 8, 12, 13, 14, 15, 18, 20, 22]
# Example data
data = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [2, 3, 4, 5],
'C': [3, 4, 5, 6] })
import plotly.express as px
import pandas as pd
# Create a heatmap
fig = px.density_mapbox(data, lat='Latitude', lon='Longitude',
z='Value', radius=10, center=dict(lat=37.77, lon=-122.42),
zoom=3, mapbox_style="stamen-toner")
# Create a heatmap
sns.heatmap(data, cmap='YlGnBu', cbar=True)
# Customize labels
plt.title('Hourly Activity Over Days of the Week')
plt.xlabel('Hour of the Day')
plt.ylabel('Day of the Week')
plt.xticks(ticks=np.arange(24) + 0.5, labels=[f'{h}:00' for h
in range(24)])
plt.yticks(ticks=np.arange(7) + 0.5, labels=['Mon', 'Tue',
'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
Effective data visualization is essential for communicating insights clearly and compellingly. By
adhering to certain best practices, you can create visualizations that are both visually appealing
and informative. It allows us to quickly interpret information and make decisions based on
complex data sets. However, the effectiveness of data visualization depends significantly on
how it is executed.
1. Know your audience
It should be compatible with the audience’s expertise and allow viewers
to view and process data easily and quickly.
2. Choose the Right Type of Visualization (Chart Type)
Common types include bar charts for comparisons (categorical), line
graphs for trends over time, scatter plots for relationships.
3. Keep the visualization Simple
Avoid clutter by removing unnecessary elements. Focus on the key
message you want to convey. Use clear labels, titles, and legends.
4. Use Clear and Consistent Labeling:
Label axes and data points: Clearly indicate what each axis represents
and the values of data points.
Use consistent units: Ensure that units are consistent throughout the
visualization.
5. Choose Appropriate Colors:
Use a color palette that is visually appealing: Avoid overly bright or
contrasting colors that can be difficult to read.
Consider color blindness: Choose colors that are easily distinguishable
for people with color vision deficiencies.
6. Provide Context
Include context by adding annotations, reference lines, or background
information to help the audience interpret the data.
7. Use Consistent Scales
When comparing multiple visualizations, use the same scales and units
to prevent misinterpretation.
8. Highlight Key Insights
Draw attention to the most important data points or trends using
emphasis techniques like bold text, larger sizes, or contrasting colors.
9. Tell a Story
Organize your visualizations in a logical sequence: Guide the viewer
through the key points of your story.
Use annotations and explanations: Provide context and additional
information as needed.