Data Visualization Notes
Data Visualization Notes
ACADEMIC NOTES
III BSC CS(AI&DS)
Dr. Mohanasathiya KS
DATA VISUALIZATION
UNIT-I
1.1 Introduction
1.2 Context of data visualization
1.3 Define methodology
1.4 Visualization design objectives
1.5 Key factors-purpose
1.6 Visualization function and tone
1.7 Visualization design options-data representation
1.8 Data presentation
1.9 Seven stages of data visualization
1.10 Widgets
1.11 Data visualization tools
1
1.1 INTRODUCTION
Data visualization is the graphical representation of data and information. It uses visual
elements such as charts, graphs, and maps to help people easily understand and interpret patterns,
trends, and insights within data. The primary goal of data visualization is to make complex data
more accessible, understandable, and usable.
2
3. Maps
- Heatmaps
- Geographical Maps
4. Specialized Visuals
- Box Plot
- Tree Map
- Network Diagrams
Data visualization is a critical skill for analysts, researchers, and decision-makers across
industries, as it bridges the gap between raw data and actionable insights.
3
- Examples:
- Exploratory Visualizations: Help analysts uncover patterns (e.g., scatter plots to find
correlations).
- Explanatory Visualizations: Communicate specific insights to stakeholders (e.g., dashboards
summarizing key performance indicators).
2. Audience
- Definition: The group of people who will view or interact with the visualization.
- Why It Matters: Understanding the audience’s knowledge level, expectations, and preferences
ensures the visualization is understandable and relevant.
- Considerations:
- Technical Audience:
- Expect detailed metrics and raw data.
- May prefer complex visuals (e.g., box plots, heatmaps).
- Non-Technical Audience:
- Prefer simplified, high-level summaries.
- Visuals should focus on clarity (e.g., bar charts, pie charts).
3. Data Characteristics
- Definition: The type, structure, and source of data being visualized.
- Impact on Context:
- Numerical Data: Suitable for line charts, bar charts, and scatter plots.
- Categorical Data: Best represented by pie charts or stacked bar charts.
- Temporal Data: Requires time-series visualizations (e.g., trends over time).
- Example:
- Visualizing sales trends over time requires chronological order and consistent time intervals.
4. Medium of Delivery
- Definition: The platform or format used to share the visualization.
- Types:
- Static: Reports, PDFs, printed charts.
- Interactive: Dashboards (e.g., Tableau, Power BI).
- Live Presentations: Slides or infographics for meetings.
- Design Adaptations:
- For static reports: Use annotations and clear legends since interaction isn’t possible.
- For interactive dashboards: Include filters, tooltips, and drill-down capabilities.
4
- Simplicity: Remove unnecessary elements to avoid clutter.
- Consistency: Use uniform colors, fonts, and scales.
- Clarity: Add labels, titles, and legends for better understanding.
- Highlight Key Points: Use color contrasts or annotations to emphasize important data.
- Example:
- In a presentation for senior management, bold the most significant data points or use color to
highlight trends.
2. Enhances Decision-Making:
- Contextualized visuals provide actionable insights aligned with goals and objectives.
3. Prevents Misinterpretation:
- By framing the data with appropriate background and explanations, it avoids misleading
conclusions.
5
Example of Context in Data Visualization
Scenario: A company wants to analyze its yearly sales data.
- Purpose: To identify trends and key revenue contributors.
- Audience: Senior management (non-technical).
- Data Characteristics:
- Numerical data (sales figures).
- Temporal data (monthly breakdown over the year).
- Medium: Interactive dashboard for periodic review.
- Design:
- Line chart for sales trends.
- Pie chart for sales distribution by product category.
- Use annotations to highlight the highest and lowest sales months.
- External Context: Include historical data from the previous year for comparison.
Conclusion:
Context in data visualization transforms raw data into meaningful insights by addressing the
purpose, audience, data, medium, and design. A well-contextualized visualization ensures that the
message is clear, impactful, and actionable, making it a vital aspect of effective data
communication.
This methodology ensures that visualizations are not only aesthetically appealing but also relevant,
accurate, and insightful for decision-making and analysis.
6
2. Data Collection: Gathering and consolidating data from relevant sources.
3. Data Preparation: Cleaning, transforming, and structuring the data to ensure quality and
usability.
4. Visualization Design: Selecting appropriate visual formats (e.g., bar charts, line graphs) and
designing for clarity and audience needs.
5. Validation: Verifying the accuracy of the data and ensuring the visualization aligns with
objectives.
6. Presentation: Sharing the visualization in a suitable medium, such as reports, dashboards, or
presentations.
3. Ensure Clarity
- Focus on making the visualization easy to interpret.
- Use clear labels, legends, titles, and avoid unnecessary elements.
4. Maintain Accuracy
- Represent data truthfully without distortion.
- Example: Use proportional scales and avoid truncated axes.
5. Focus on Relevance
- Display only the most important metrics or insights for the audience.
- Example: For a marketing team, show traffic sources and growth trends.
7
6. Create Engagement
- Use visually appealing designs to capture attention.
- Choose appropriate colors, fonts, and layouts to enhance aesthetic appeal.
7. Ensure Efficiency
- Simplify visuals to allow users to extract insights quickly.
- Example: Summarize key metrics in a dashboard header.
8. Facilitate Comparability
- Design visuals to enable comparisons across categories, time, or data points.
- Example: Use side-by-side bar charts or include benchmarks.
9. Add Explorability
- Provide options for users to filter, drill down, or explore data.
- Example: Add dropdowns or sliders to view different time periods.
8
1. Communicating Insights
- Objective: To convey key findings and trends from data to stakeholders.
- Examples:
- Highlighting sales growth trends over a year to management.
- Showing customer demographics to the marketing team.
- Design Approach:
- Use concise, focused visualizations like bar charts or line graphs to emphasize insights.
- Examples:
- Monitoring website traffic using a dashboard.
- Tracking project progress through a Gantt chart.
- Design Approach:
- Create interactive dashboards with KPIs and progress indicators.
9
5. Explaining Relationships
- Objective: To show how different variables are connected or influence each other.
- Examples:
- Displaying correlations between ad spend and sales revenue.
- Highlighting the impact of weather on customer footfall.
- Design Approach:
- Use scatter plots, bubble charts, or network diagrams.
6. Highlighting Outliers
- Objective: To identify and emphasize unusual or unexpected data points.
- Examples:
- Detecting anomalies in expense reports.
- Identifying products with exceptionally high or low performance.
- Design Approach:
- Use box plots, scatter plots, or annotated bar charts to call out outliers.
7. Telling a Story
- Objective: To narrate a data-driven story to guide decision-making.
- Examples:
- Presenting the impact of a new policy using before-and-after data visualizations.
- Showing how customer satisfaction evolved over time due to improved services.
- Design Approach:
- Combine multiple visuals into a cohesive story, using annotations and highlights.
8. Supporting Decision-Making
- Objective: To aid stakeholders in making informed decisions based on data.
- Example:
- Providing risk analysis to investors.
- Offering product performance data to guide inventory decisions.
- Design Approach:
- Use dashboards or interactive reports that present actionable insights clearly.
10
3. Focus on Key Metrics:
- Highlight the most relevant data to align with the objective.
4. Avoid Overloading Information:
- Keep the visualization simple and purposeful.
Conclusion
The purpose of data visualization serves as the guiding principle for creating effective and
meaningful visual representations. By clearly defining and aligning the visualization with its
purpose, you can ensure that it communicates the right message, meets audience needs, and drives
actionable insights.
11
- Example: Sales growth over months visualized using a line chart to reveal seasonality.
c. Comparing Metrics
- Allows comparisons between categories, groups, or periods.
- Example: A grouped bar chart showing revenue across multiple regions.
e. Supporting Decision-Making
- Provides actionable insights by summarizing key metrics and trends.
- Example: Dashboards showing real-time performance indicators (KPIs) for a business.
f. Communicating Insights
- Translates data into stories that are easily understood by stakeholders.
- Example: Infographics summarizing the impact of a marketing campaign.
g. Enabling Interaction
- For dynamic and interactive visualizations, users can explore data by filtering, drilling down, or
customizing views.
- Example: A dashboard where users can filter sales data by region or product.
b. Emphatic Tone
- Purpose: To highlight critical insights or draw attention to key areas.
- Characteristics:
12
- Use of bright colors or bold labels to emphasize trends or outliers.
- Annotations to explain significant points.
- Example: A line chart with a highlighted peak representing a sales boost due to a promotional
event.
c. Persuasive Tone
- Purpose: To influence the audience toward a specific conclusion or decision.
- Characteristics:
- Strategic use of colors and comparisons to drive the intended message.
- Accompanied by annotations or narratives emphasizing benefits or risks.
- Example: A pie chart showing the market share of a company to convince investors of its
dominance.
d. Informative Tone
- Purpose: To educate the audience by providing comprehensive details.
- Characteristics:
- Data-rich visuals with additional explanations or legends.
- Balanced use of colors and details.
- Example: An infographic explaining the impact of climate change with data, maps, and
supporting text.
e. Engaging Tone
- Purpose: To captivate the audience and retain their attention.
- Characteristics:
- Vibrant and visually appealing design.
- Use of storytelling elements like sequences or progression.
- Example: An animated dashboard showcasing a company’s milestones over time.
Conclusion
The function of data visualization lies in simplifying data, identifying trends, and
supporting decision-making, while its tone shapes the audience’s perception and engagement.
Combining both effectively ensures that visualizations are not only insightful but also impactful
and aligned with the communication goals.
13
1.7 VISUALIZATION DESIGN OPTIONS-DATA REPRESENTATION
Data representation is a crucial aspect of data visualization. It refers to how data is visually
encoded using various chart types, designs, and formats to present the information in a meaningful
and comprehensible way. The choice of data representation depends on the type of data, the
message to be conveyed, and the audience’s needs.
- Pie Chart: Represents parts of a whole as slices of a circle. Suitable for showing proportions.
- Example: Market share distribution of different brands.
- Column Chart: Similar to bar charts but with vertical bars. Typically used for comparing
categories across time or other ordered variables.
- Example: Monthly sales across different regions.
14
- Line Chart: Ideal for showing trends or changes over time, with the x-axis representing time
and the y-axis representing the value.
- Example: Stock prices over a year.
15
- Stacked Bar Chart: Displays categorical data with multiple segments in a single bar to show
how categories contribute to a whole.
- Example: Revenue breakdown by product category over time.
Conclusion
Visualization design options for data representation offer a wide range of possibilities to
communicate information effectively. By selecting the appropriate visual format based on the type
of data, the intended message, and the audience, you can ensure that the visualization is both
engaging and informative. A well-designed visualization allows users to quickly understand
patterns, relationships, and insights in the data, making it a powerful tool for decision-making and
analysis.
16
1. Clarity and Simplicity
- Goal: The primary objective of data presentation is to ensure clarity. The visual representation
should be simple and easy to interpret at a glance.
- How to Achieve It:
- Use clear, legible fonts and labels for axes, titles, and data points.
- Avoid visual clutter by limiting the number of elements in the chart or graph.
- Focus on presenting only the most important data to highlight key insights.
17
- Goal: Colors and design elements should enhance comprehension, not overwhelm the
audience.
- How to Achieve It:
- Use color to highlight important data points or trends.
- Choose color schemes that are easy on the eyes and distinguishable (e.g., avoid using too
many bright or similar colors).
- Maintain consistency in color use across different charts or graphs to avoid confusion.
6. Consistency in Design
- Goal: Consistent design across multiple visuals makes it easier for the audience to compare
and interpret data.
- How to Achieve It:
- Use the same colors, fonts, and design layout for related visualizations.
- Ensure that chart styles (e.g., axis labeling, gridlines) are uniform to help the audience follow
the data easily.
7. Interactivity
- Goal: For certain data visualization projects, allowing users to interact with the data can provide
deeper insights and personalized exploration.
- How to Achieve It:
- Implement features like filtering, hovering for details, zooming in on specific areas, or
dynamic updates as users interact with the data.
- Provide users with the ability to customize the view to suit their needs (e.g., showing data for
specific time periods or regions).
18
Conclusion
Effective data presentation in data visualization is about making complex data easy to
understand and insightful for the audience. By focusing on clarity, choosing the right visuals,
providing proper labels and annotations, and emphasizing key insights, the data becomes a
powerful tool for decision-making and storytelling. The overall goal is to communicate the data's
meaning in an engaging and accessible way, ensuring that the audience can quickly absorb and act
on the information presented.
19
1. Define the Objective: Identify the purpose of the visualization and understand the audience's
needs to focus the message.
2. Collect and Prepare Data: Gather relevant data, clean it, and structure it for analysis.
3. Choose the Right Visualization Type: Select an appropriate chart or graph type based on the
data and the objective (e.g., bar charts for comparisons, line charts for trends).
4. Design and Build the Visualization: Create the visualization with clear layout, color schemes,
and data encoding to ensure clarity and engagement.
5. Analyze and Interpret the Data: Examine the visualization for trends, patterns, or insights,
and refine as necessary.
6. Present the Visualization: Share the visualization with the audience, providing context and a
clear narrative to guide understanding.
7. Iterate and Improve: Collect feedback from the audience, refine the design, and adjust the
visualization to improve clarity and impact.
These stages ensure that data is presented effectively, is easy to understand, and communicates the
intended message clearly.
20
1.10 WIDGETS
Widgets in data visualization refer to interactive elements or components that are used to
display data and allow users to interact with it, making the visualization more dynamic and
engaging. Widgets enable users to filter, zoom, or customize the displayed data, offering a more
personalized experience and deeper insights. They are typically used in dashboards, interactive
charts, and data-driven applications.
2. Dropdown Menus:
- Dropdown menus enable users to select from predefined categories or filters, such as choosing
a specific product, region, or metric to display in the visualization.
3. Checkboxes:
- Checkboxes allow users to select multiple categories or parameters for comparison. For
instance, users can check multiple boxes to view data from different regions or different time
periods.
4. Buttons:
- Buttons are used to trigger specific actions, such as switching between different views of the
data (e.g., switching from a bar chart to a pie chart), resetting filters, or submitting data for
processing.
5. Interactive Graphs/Charts:
- These widgets enable users to click, hover, or zoom to reveal more detailed data points or
annotations. This allows for deeper exploration of the data without overwhelming the user with
too much information at once.
6. Search Bars:
- Search bars let users quickly find specific data points or categories by typing keywords, such
as searching for a particular city or product in a dataset.
7. Tooltips:
21
- Tooltips display additional information when users hover over specific data points or sections
of the visualization. For example, hovering over a bar in a bar chart may reveal exact numbers,
trends, or related information.
8. Toggle Switches:
- Toggles allow users to switch between two different states or views, such as changing the
metric being displayed (e.g., from sales volume to revenue) or toggling between different types of
visualizations.
9. Maps:
- Interactive maps are a type of widget used in geospatial data visualization, where users can
click, zoom, or hover to explore data related to specific geographical regions.
Conclusion
Widgets in data visualization are essential for creating interactive, user-friendly dashboards
and reports. They enable users to filter, explore, and customize data views, making complex
datasets more accessible and actionable. By adding interactivity, widgets enhance the overall data
analysis process and improve decision-making.
22
Types of Data Visualization Tools
1. Business Intelligence (BI) Tools:
- BI tools are designed to help businesses analyze and visualize large volumes of data, often for
decision-making and reporting. These tools are highly interactive and provide real-time
dashboards.
Examples:
- Tableau: A leading data visualization tool known for its ability to create interactive and
customizable dashboards, visualizations, and reports.
- Power BI: A Microsoft product that allows users to create reports, dashboards, and share
insights across an organization. It's integrated well with other Microsoft services and Excel.
- QlikView: A BI tool that provides interactive data discovery, analytics, and visualization
features with strong data association capabilities.
Examples:
- D3.js: A JavaScript library for creating dynamic and interactive data visualizations on the web.
It offers extensive customization and control over the visual output.
- Plotly: A graphing library that can be used with Python, R, and JavaScript to create interactive
plots, charts, and dashboards. It’s known for its high-quality visuals.
- RAWGraphs: An open-source tool that helps users convert data into a wide range of
visualizations. It’s great for beginners and easy to use.
Examples:
23
- Google Data Studio: A free tool that allows users to create customizable reports and
dashboards. It integrates seamlessly with other Google products like Google Analytics and Google
Sheets.
- Canva: While primarily a graphic design tool, Canva offers simple charts, graphs, and
infographics for data visualization. It’s user-friendly and ideal for beginners.
- Infogram: An online platform for creating interactive infographics, charts, and reports. It’s
suitable for non-technical users and offers an intuitive drag-and-drop interface.
Examples:
- ArcGIS: A powerful tool for geospatial data visualization and mapping, commonly used in
fields like urban planning, environmental science, and geography.
- Gephi: An open-source tool for visualizing and analyzing networks, especially useful for social
network analysis and graph theory.
- SPSS: A statistical software tool used for analyzing quantitative data. It has built-in
visualization options, including charts and plots.
Examples:
- Matplotlib: A Python library for creating static, interactive, and animated visualizations. It's
widely used for plotting data in Python-based analysis workflows.
- Seaborn: Built on top of Matplotlib, Seaborn is used for statistical data visualization in Python
and provides higher-level functions to create more complex visuals.
- ggplot2: A data visualization package in R that follows a grammar of graphics approach to
create complex visualizations with simple code.
24
Key Features of Data Visualization Tools
- Ease of Use: Many modern tools are designed for non-technical users, offering drag-and-drop
functionality and pre-built templates.
- Customization: Advanced tools like D3.js and Plotly allow deep customization, providing
control over design elements, interactivity, and data representation.
- Interactivity: Interactive features like filtering, zooming, and tooltips make data exploration
more engaging and insightful.
- Integration: Many tools integrate with other platforms (e.g., databases, cloud services, and
spreadsheets) to pull in data directly for visualization.
- Real-Time Data: Some tools allow for real-time data visualization, which is crucial for tracking
live performance metrics and monitoring ongoing changes.
25
- Cost: Some tools, like Google Data Studio and RAW Graphs, are free, while others like Tableau
and Power BI may require a subscription.
Conclusion
Data visualization tools are essential for turning raw data into insightful, actionable
visualizations. Whether you're a business professional looking for dashboards, a developer
building custom graphs, or a researcher analyzing geospatial data, the right tool can significantly
enhance data interpretation and decision-making.
26
UNIT – II
2.1 Visualizing data methods
2.2 Mapping
2.3 Time series
2.4 Connections and correlations
2.5 Scatter plot maps
2.6 Trees, hierarchies and recursion
2.7 Networks and graphs
2.8 Info graphics
27
2.1 VISUALIZING DATA METHODS
Data visualization is essential for understanding patterns, trends, and insights in data. There
are multiple techniques available, each suited for different types of data and analysis. Below is a
detailed overview of various data visualization methods.
1. Basic Charts
These are the most common types of visualizations, used to present simple relationships and
distributions.
1.1 Bar Chart
• Purpose: Used for comparing categorical data.
• Example Use Case: Comparing sales across different regions.
• Variations:
o Grouped Bar Chart (compares multiple categories side-by-side).
o Stacked Bar Chart (segments within bars to show proportions).
o Horizontal Bar Chart (used when category labels are long).
Example: A bar chart showing the number of students in different courses.
28
2. Statistical Charts
These charts help in understanding distributions and relationships between variables.
2.1 Histogram
• Purpose: Shows the frequency distribution of a dataset.
• Example Use Case: Analysing exam score distribution.
• Difference from Bar Chart: Bars in a histogram touch each other, as they represent
continuous data.
Example: A histogram of customer ages in a store.
3. Hierarchical Visualizations
These help in visualizing data that is organized in a hierarchy.
3.1 Tree Map
• Purpose: Uses nested rectangles to show proportions.
• Example Use Case: Visualizing sales contribution of different product categories.
Example: A tree map showing revenue from different product categories.
4. Geospatial Visualizations
Used for data with geographical components.
29
4.1 Choropleth Map
• Purpose: Uses colour shades to represent values in different geographical regions.
• Example Use Case: Mapping population density across countries.
Example: A choropleth map showing COVID-19 cases by country.
4.2 Heatmap
• Purpose: Uses colour intensities to represent values in a matrix.
• Example Use Case: Displaying correlation between different stock prices.
Example: A heatmap showing correlation between different subjects in an exam.
5. Network Graphs
Used to represent relationships and connections.
5.1 Node-Link Diagram
• Purpose: Represents entities as nodes and their relationships as links.
• Example Use Case: Visualizing social network connections.
Example: A node-link diagram showing LinkedIn connections.
6. Advanced Visualizations
Used for specialized analysis.
6.1 Word Cloud
• Purpose: Represents text data by varying word size based on frequency.
• Example Use Case: Analyzing common words in customer feedback.
Example: A word cloud of frequently used words in movie reviews.
30
Data Type Suggested Visualization
Distribution Data Histogram, Box Plot
Relationship Data Scatter Plot, Bubble Chart
Geographical
Choropleth Map, Heatmap
Data
Hierarchical Data Tree Map, Sunburst Chart
Node-Link Diagram, Chord
Network Data
Diagram
Conclusion
Data visualization is a powerful tool that makes complex data easier to understand and
interpret. Choosing the right method depends on the data type and the insights we want to
communicate.
2.2 MAPPING
Mapping in data visualization is a technique used to represent data spatially on a
geographical map. It helps in identifying patterns, trends, and relationships based on location-
based data. This method is widely used in fields such as business analytics, urban planning,
epidemiology, and climate science. Below is a step-by-step explanation of how mapping is done,
along with practical examples.
31
The first step in mapping is collecting geospatial data, which includes location-based
information such as latitude, longitude, or region-based datasets like country/state boundaries. This
data can be obtained from various sources such as OpenStreetMap, Google Maps API, Natural
Earth, government GIS databases, or Kaggle datasets. The data format is usually in CSV (for
simple coordinate-based data), GeoJSON, or Shapefiles (.shp).
Example: Suppose a company wants to visualize the locations of its stores across the country.
It collects store addresses and converts them into latitude and longitude coordinates using a
geolocation service like Google Maps API.
32
m = folium.Map(location=[20.5937, 78.9629], zoom_start=5)
# Create a map
m = folium.Map(location=[37.7749, -122.4194], zoom_start=12)
Conclusion
33
Mapping in data visualization is a powerful method to analyze spatial data, whether it's
tracking disease outbreaks, optimizing business operations, or understanding geographical trends.
34
import pandas as pd
# Sample Data
data = {'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'], 'Sales': [100, 120, 150, 170, 200]}
df = pd.DataFrame(data)
# Plot
plt.plot(df['Month'], df['Sales'], marker='o', linestyle='-', color='b')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthly Sales Trend')
plt.grid(True)
plt.show()
3.4 Heatmap
A heatmap is useful for identifying patterns and seasonality in time series data by showing
intensity with colors.
Example: A hotel visualizing room occupancy rates across months and days.
import seaborn as sns
# Sample Data
heat_data = pd.DataFrame({'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
'Week1': [5, 8, 6, 7, 9],
'Week2': [6, 7, 9, 5, 8]})
# Heatmap Plot
35
sns.heatmap(heat_data.set_index('Day'), cmap='coolwarm', annot=True)
plt.title("Hotel Occupancy Heatmap")
plt.show()
6. Conclusion
Time series visualization is essential for analyzing trends, seasonality, and anomalies in
data. Whether it's tracking stock prices, weather changes, sales performance, or website traffic,
choosing the right visualization technique (line charts, bar charts, heatmaps, candlestick charts) is
key to understanding time-dependent patterns.
36
Connections represent relationships between entities in a dataset. They help in
understanding interactions, dependencies, and flows of data.
Common Visualizations for Connections:
• Network Graphs: Show relationships between entities, such as social media connections.
• Sankey Diagrams: Visualize the flow of information, like website navigation paths.
Example: A LinkedIn connection graph where users (nodes) are linked by friendships (edges).
37
Key Takeaways
• Connections help visualize relationships and flows.
• Correlations help identify patterns and dependencies.
• Used in business, healthcare, finance, and more for data-driven decision-making.
38
# Sample Data
data = {'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'],
'Latitude': [40.7128, 34.0522, 41.8781, 29.7604, 25.7617],
'Longitude': [-74.0060, -118.2437, -87.6298, -95.3698, -80.1918],
'Sales': [1000, 800, 900, 750, 700]}
df = pd.DataFrame(data)
# Scatter Map
fig = px.scatter_mapbox(df, lat="Latitude", lon="Longitude", size="Sales",
hover_name="City", zoom=3, mapbox_style="open-street-map")
fig.show()
Explanation: This code plots store sales on a map using latitude and longitude, with dot sizes
representing sales volume.
4. Real-World Applications
• Retail – Visualizing store performance by region.
• Public Health – Mapping disease outbreaks.
• Tourism – Analyzing tourist hotspots.
• Logistics – Optimizing delivery routes based on order locations.
5. Conclusion
Scatter plot maps are powerful tools for spatial data analysis, helping businesses and
researchers make informed decisions. They provide insights into geographical patterns, customer
behavior, and market trends.
39
• Sunburst Charts – A circular way to display hierarchical data, where each ring represents
a deeper level.
• Treemaps – Represent hierarchical data with nested rectangles, often used in disk storage
analysis or financial data.
Example: A website’s structure, where the homepage is the root, followed by main categories
and subcategories.
40
3. Recursion in Data Visualization
Recursion is the process of breaking down a problem into smaller sub-problems, often used in
hierarchical data representations.
Example: In a corporate hierarchy, a manager can have multiple employees, each of whom
can also be a manager with more employees beneath them. This structure repeats at each level,
creating a recursive pattern.
Recursive Patterns in Visualizations
• Tree structures use recursion to define relationships at different levels.
• Organizational charts display roles in a company, where each manager has subordinates.
• File systems visualize folders containing subfolders and files.
Example: A country's administrative structure: Country → States → Cities → Districts →
Neighbourhoods.
4. Real-World Applications
• File Systems – Visualizing the folder structure on a computer.
• Business Structures – Displaying company hierarchies and reporting structures.
• Biology – Representing species classification through phylogenetic trees.
• Geography – Mapping regions, subregions, and local areas.
5. Conclusion
Trees, hierarchies, and recursion are essential for visualizing structured data. Whether
analyzing organizational structures, file systems, or classification systems, these methods help
simplify complex relationships and improve data interpretation.
41
can signify various types of relationships, such as social connections, transportation routes, or
communication pathways.
• Nodes (Vertices): Represent entities in the network (e.g., individuals, cities, devices).
• Edges (Links): Represent the relationships or connections between nodes (e.g.,
friendships, roads, data transfer).
Example: In a social network, individuals are represented as nodes, and friendships between them
are edges connecting those nodes.
Graphs
A graph is a mathematical representation of a network. It consists of nodes and edges, where:
• Undirected Graph: Edges have no direction (the relationship is mutual, like friendships).
• Directed Graph (Digraph): Edges have a direction (representing flow, such as
follower/following relationships).
Graphs are used to model relationships in many fields like computer science, biology, sociology,
and logistics.
42
2.3 Communication Networks
Communication networks represent connections between devices or communication nodes. Nodes
could be computers, phones, or servers, and edges are the communication links (e.g., wired or
wireless).
Example: The internet, where websites and servers are nodes, and hyperlinks or data
connections are edges.
43
Many networks have communities or clusters, which are groups of nodes that are more
densely connected to each other than to the rest of the network. Identifying these communities
helps uncover subgroups or regions of interest in large networks.
5. Conclusion
Networks and graphs provide powerful tools for representing and analyzing interconnected
data. Whether used for understanding social relationships, transportation routes, communication
flows, or organizational structures, these visualizations allow us to uncover patterns, optimize
systems, and make informed decisions.
44
Icons and illustrations are used to simplify and symbolize concepts. Instead of long
descriptions, small images or symbols can represent entire ideas, making the infographic more
engaging.
Example: A dollar sign icon to represent financial data or a suitcase to represent travel
statistics.
1.3 Colour
Colour plays an important role in infographics by:
• Drawing attention to important sections.
• Differentiating between categories or variables.
• Conveying meaning, such as using green for positive trends and red for negative trends.
2. Types of Infographics
2.1 Statistical Infographics
These infographics focus on displaying numerical data in an easy-to-understand format,
using charts and graphs. They are often used to highlight trends, comparisons, and statistics.
Example: A bar chart infographic comparing yearly sales figures.
45
The combination of colorful visuals and interactive elements (like clickable sections or
embedded videos) makes infographics more engaging and appealing, keeping the audience's
attention longer.
5. Conclusion
46
Infographics combine the power of data visualization, graphic design, and concise text to
simplify complex data and present it in a way that’s both visually appealing and easy to understand.
They are essential tools in fields ranging from marketing and business to education and healthcare.
47
UNIT-3
VISUALIZING DATA PROCESS
48
3.1 VISUALIZING DATA PROCESS
Data visualization is the graphical representation of information and data using visual
elements like charts, graphs, and maps. It helps in identifying trends, patterns, and insights in
data for better decision-making.
Steps in the Data Visualization Process
1. Define the Objective
o Identify the purpose of visualization (e.g., trend analysis, comparison,
distribution).
o Understand the target audience (technical, non-technical, decision-makers).
2. Collect and Prepare Data
o Gather raw data from sources (databases, APIs, spreadsheets, etc.).
o Clean and preprocess the data (handling missing values, removing duplicates,
formatting).
3. Choose the Right Visualization Type
o Line Chart: For trends over time.
o Bar Chart: For comparisons between categories.
o Pie Chart: For proportions.
o Scatter Plot: For relationships between variables.
o Heatmap: For intensity variations across a dataset.
o Box Plot: For distribution and outliers.
4. Use the Right Tools & Libraries
o Python: Matplotlib, Seaborn, Plotly, Bokeh.
o R: ggplot2, Shiny.
o BI Tools: Tableau, Power BI, Google Data Studio.
o Excel: Pivot tables, charts.
5. Design the Visualization
o Ensure clarity, simplicity, and readability.
o Use appropriate colours, labels, and legends.
o Avoid clutter and unnecessary elements.
6. Analyse & Interpret Insights
o Identify patterns, correlations, and outliers.
o Compare results with expectations.
o Highlight key takeaways for stakeholders.
7. Share & Iterate
o Present in reports, dashboards, or interactive applications.
o Gather feedback and refine visualizations if needed.
49
3.2 ACQUIRING DATA
Acquiring data is the first and most crucial step in the data visualization process. It involves
gathering, processing, and preparing data before visualizing it. High-quality data ensures accurate
and meaningful visualizations.
50
• Optimize dataset size for faster processing.
# Preview data
print(df.head())
51
3. Scientific & Research Data
For academic and scientific projects:
• NASA Earth Data – earthdata.nasa.gov (Climate, satellite imagery)
• Harvard Dataverse – dataverse.harvard.edu (Research datasets)
• UCI Machine Learning Repository – archive.ics.uci.edu/ml
52
• Scrapy – A powerful framework for large-scale web scraping.
• Selenium – Automates web browsers to extract dynamic content.
Example: Extracting Data Using BeautifulSoup
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
url = "https://api.openweathermap.org/data/2.5/weather?q=London&appid=YOUR_API_KEY"
response = requests.get(url)
data = response.json()
53
Example: Fetching Data from Google Sheets using Python
import pandas as pd
url = "https://docs.google.com/spreadsheets/d/your_sheet_id/export?format=csv"
df = pd.read_csv(url)
client = bigquery.Client()
query = "SELECT * FROM `dataset.table` LIMIT 10"
df = client.query(query).to_dataframe()
print(df)
54
Example: Receiving Real-Time Data via WebSockets
import websocket
ws = websocket.WebSocketApp("wss://example.com/realtime-data", on_message=on_message)
ws.run_forever()
55
Reading CSV in Python
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
with open("data.json") as f:
data = json.load(f)
print(data)
56
<name>Alice</name>
<age>30</age>
<salary>50000</salary>
</employee>
</employees>
Parsing XML in Python
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
for emp in root.findall("employee"):
print(emp.find("name").text)
57
• Tools: Python (pyarrow, fastparquet), BigQuery, AWS Athena.
Reading Parquet in Python
df = pd.read_parquet("data.parquet")
print(df.head())
gdf = gpd.read_file("map.geojson")
print(gdf.head())
58
Conclusion
Files play a key role in data visualization by storing, structuring, and providing access to
raw data. The choice of file format depends on the data type, size, and visualization tool being
used.
df = pd.read_csv("data.csv")
print(df.head()) # Display first 5 rows
59
{"name": "Alice", "age": 30, "salary": 50000},
{"name": "Bob", "age": 25, "salary": 45000}
]
}
Reading JSON in Python
import json
print(data["employees"])
Converting JSON to Pandas DataFrame for Visualization
df = pd.DataFrame(data["employees"])
print(df)
tree = ET.parse("data.xml")
root = tree.getroot()
60
print(name, age, salary)
url = "https://api.example.com/data"
response = requests.get(url)
data = response.json() # Convert response to JSON
print(data)
61
Final Thoughts
• Structured data (CSV, JSON) is easier to load for visualization.
• Unstructured text (TXT, NLP data) requires additional processing.
• APIs and XML are useful for real-time data collection.
df = pd.read_csv("data.csv")
print(df.head()) # Shows the first 5 rows
Reading a JSON File (.json)
import json
B. Writing Files
Writing to a Text File (.txt)
with open("output.txt", "w") as file:
file.write("Hello, this is a test file.")
Writing a CSV File (.csv)
df.to_csv("output.csv", index=False) # Saves DataFrame as CSV
Writing a JSON File (.json)
with open("output.json", "w") as file:
json.dump(data, file, indent=4)
62
2. Working with Folders (Directories)
Folders (directories) help organize files systematically.
A. Creating a Folder
Creating a New Folder (Directory)
import os
shutil.move("file.txt", "new_folder/file.txt")
63
4. Handling Large Files
For large files, reading line-by-line helps save memory.
Reading a Large File Efficiently
with open("large_file.txt", "r") as file:
for line in file:
print(line.strip()) # Process each line without loading the entire file
Final Thoughts
• Organizing files and folders is crucial for data management.
• Handling different file formats enables smooth data processing.
• Automating tasks like renaming and moving files saves time.
64
import os
folder_path = "my_folder"
files = os.listdir(folder_path)
print(files) # Prints list of files and folders
B. Using os.scandir()
• Provides more details about each file (e.g., if it's a file or directory).
• Returns an iterator instead of a list, making it memory-efficient.
Example: List only files (excluding folders)
with os.scandir("my_folder") as entries:
for entry in entries:
if entry.is_file(): # Checks if it's a file
print(entry.name)
folder = Path("my_folder")
files = [file.name for file in folder.iterdir() if file.is_file()]
print(files)
folder = Path("my_folder")
txt_files = [file.name for file in folder.glob("*.txt")]
print(txt_files)
Example: List multiple file types (.jpg and .png)
image_files = [file.name for file in folder.glob("*.jpg")] + [file.name for file in
folder.glob("*.png")]
print(image_files)
65
A. Using os.walk()
Example: List all files inside a folder and its subfolders
for root, dirs, files in os.walk("my_folder"):
for file in files:
print(os.path.join(root, file)) # Prints full path of each file
B. Using pathlib.rglob() (Recommended)
Example: List all .txt files in all subfolders
txt_files = [file for file in Path("my_folder").rglob("*.txt")]
print(txt_files)
Final Thoughts
• Use pathlib.Path.iterdir() for simple file listing.
• Use os.walk() or pathlib.rglob() for recursive folder scanning.
• Filter files by extension using glob() or rglob().
66
1. Understanding Asynchronous Execution
Asynchronous execution means that tasks run independently and do not block each other. In
contrast, synchronous execution processes tasks one by one, waiting for each to complete
before moving to the next.
Example:
• Synchronous: You order food, wait until it’s prepared, then eat.
• Asynchronous: You order food, do other tasks while waiting, and eat when it's ready.
For image downloading, asynchronous execution allows multiple images to be fetched at the
same time instead of waiting for each download to finish before starting the next.
67
1. Multithreading (e.g., Python ThreadPoolExecutor) → Uses multiple threads to fetch
images in parallel.
2. Async I/O (e.g., Python asyncio) → Uses a single thread but switches between tasks
efficiently to keep execution non-blocking.
Which is better?
Multithreading → Better for CPU-bound tasks.
Async I/O → Better for network-bound tasks like image downloads.
Since downloading images depends on network speed, Async I/O is the preferred method.
Conclusion
• Asynchronous image downloads improve efficiency by fetching multiple images at
once.
• They rely on non-blocking requests using event-driven programming.
• Async I/O (e.g., aiohttp in Python) is the best approach for downloading images over
the internet.
68
C. Security Enhancements
• OAuth & JWT: Secure authentication and token-based access.
• HTTPS & SSL: Encrypts data transmission.
• CSRF & XSS Protection: Prevents web attacks.
69
Conclusion
• Advanced web techniques enhance performance and security (AJAX, caching, load
balancing).
• Databases improve data management (SQL vs. NoSQL, indexing, replication).
• Handling large files efficiently requires cloud storage, streaming, and batch processing.
******************************************************************************
70