0% found this document useful (0 votes)

52 views13 pages

Data Exploration and Visualization Unit 3

This document provides an introduction to data visualization, outlining its importance in data analysis and decision-making. It details the seven stages of visualizing data, including defining objectives, data collection, cleaning, transformation, analysis, visualization, and interpretation. Additionally, it covers tools for data processing, mapping techniques, and concepts related to time series analysis, correlations, and data acquisition.

Uploaded by

Dev Mane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views13 pages

Data Exploration and Visualization Unit 3

Uploaded by

Dev Mane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Unit 3: Introduction to Data Visualization

Overview

Data visualization is a crucial process in data analysis, enabling us to present data in a way that is
accessible and interpretable. It allows analysts to identify trends, patterns, and outliers,
facilitating decision-making. This chapter covers the fundamental concepts of data visualization,
including its stages, methods for processing and mapping data, and various visualization
techniques.

1. The Seven Stages of Visualizing Data

Visualizing data involves a structured approach that can be broken down into seven essential
stages:

1.1 Define Objectives

 Description: This initial stage is crucial for determining what you aim to achieve with
the visualization. Clear objectives help to focus the analysis.
 Key Questions:
o What question am I trying to answer?
o Who is the audience for the visualization?
o What decisions will this visualization support?
 Example: A marketing team wants to visualize customer demographics to tailor their
advertising strategies. Their objective is to identify which age groups are purchasing
specific products.

1.2 Data Collection

 Description: Collecting relevant data is the foundation of effective visualization. This

may involve gathering data from multiple sources, including databases, APIs, surveys,
and existing datasets.
 Methods:
o Surveys: Directly collect data from users.
o Databases: Query existing databases using SQL.
o APIs: Use APIs (Application Programming Interfaces) to pull data from web
services.
 Example: A researcher collecting data from a public health database, customer feedback
forms, and sales reports.

1.3 Data Cleaning

 Description: Raw data often contains inaccuracies, missing values, or irrelevant

information. Data cleaning is the process of preparing this data for analysis.
 Techniques:
o Handling Missing Values: Options include deletion, imputation (replacing
missing values with statistical measures), or using algorithms that handle missing
data.
o Removing Duplicates: Ensuring no repeated entries distort the analysis.
o Standardization: Ensuring consistency in formats (e.g., date formats).
 Example: A dataset with customer entries might have missing phone numbers. The
analyst can either remove these entries or replace missing numbers with a placeholder.

1.4 Data Transformation

 Description: Data transformation involves converting data into a suitable format or

structure for analysis. This may include normalizing values, aggregating data, or creating
new calculated fields.
 Techniques:
o Aggregation: Summarizing data at a higher level (e.g., daily to monthly totals).
o Normalization: Adjusting values to a common scale.
o Pivoting: Restructuring data for easier analysis.
 Example: Converting daily sales figures into monthly sales by summing daily values.

1.5 Data Analysis

 Description: Analyzing the data involves applying statistical techniques to extract
meaningful insights and patterns. This step is often iterative, requiring adjustments based
on findings.
 Methods:
o Descriptive Statistics: Summarizing data using mean, median, mode, and
standard deviation.
o Inferential Statistics: Making predictions or generalizations about a population
based on sample data.
o Regression Analysis: Exploring relationships between variables.
 Example: A company might perform a regression analysis to understand how changes in
advertising spend affect sales revenue.

1.6 Data Visualization

 Description: This stage is where you create visual representations of your data. The
choice of visualization depends on the data type and the insights you want to convey.
 Common Visualizations:
o Bar Charts: Useful for comparing categories.
o Line Charts: Ideal for showing trends over time.
o Heatmaps: Displaying data density across geographical locations or matrices.
 Example: Creating a bar chart to compare sales performance across different product
lines.

1.7 Interpretation and Presentation

 Description: After visualizing the data, the next step is to interpret the results and
prepare to communicate them effectively. This often involves creating reports or
presentations.
 Key Aspects:
o Highlighting Key Findings: Focus on the most significant insights.
o Storytelling: Use narrative techniques to guide the audience through the data.
o Visual Design: Ensure that visualizations are clear, accessible, and appealing.
 Example: Presenting a dashboard that includes key metrics and visualizations, explaining
trends in sales and marketing effectiveness.
2. Getting Started with Processing

Data processing is a critical preliminary step that involves organizing, cleaning, and preparing
data for visualization.

2.1 Tools for Data Processing

 Python:
o Libraries: Pandas for data manipulation, NumPy for numerical data operations.
o Example:

import pandas as pd

df = pd.read_csv('sales_data.csv')

df.dropna(inplace=True) # Removes missing values

 R:
o Libraries: dplyr for data manipulation, tidyr for tidying data.
o Example:

library(dplyr)

cleaned_data<- sales_data %>%

filter(!is.na(sales)) # Filters out rows with NA in sales

 SQL: For querying databases and extracting data directly.

o Example:
SELECT * FROM sales WHERE sales > 1000;

2.2 Basic Data Operations

 Aggregation: Summarizing data to derive insights. Common functions include:

o Sum: Total of a numerical column.
o Mean: Average of a numerical column.
o Count: Number of entries in a column.
 Filtering: Selecting subsets based on certain criteria.
o Example: Filtering sales records where revenue exceeds a threshold.

filtered_df = df[df['revenue'] > 1000]

 Transformation: Modifying data, such as scaling numerical values or creating new

calculated fields.
o Example: Creating a profit margin column.

df['profit_margin'] = df['profit'] / df['revenue']

3. Mapping

Mapping techniques help visualize spatial relationships and distributions within the data.

3.1 Types of Maps

 Choropleth Maps:
o Description: These maps use colors to represent data values in specific
geographical regions.
o Example: A choropleth map showing unemployment rates across different states,
where darker shades indicate higher unemployment.
 Heatmaps:
o Description: Heatmaps indicate density or intensity of data points over a
geographical area.
o Example: A heatmap showing areas of high customer engagement in a city based
on social media activity.
 Dot Maps:
o Description: Represent individual data points as dots on a map, providing a
visual indication of data distribution.
o Example: A dot map displaying the location of all customers within a region.
3.2 Creating Maps

 Libraries and Tools:

o Tableau: User-friendly software for creating interactive maps.
o Python: Libraries like Folium for web-based maps and Geopandas for
geographic data analysis.
o Example using Folium:

import folium

m = folium.Map(location=[45.5236, -122.6750], zoom_start=13)

folium.Marker([45.5236, -122.6750], popup='Portland').add_to(m)

m.save('map.html')
4. Data Exploration and Visualization - Detailed Notes

1. Time Series Analysis

Time series data consists of observations made sequentially over time. Examples include stock
prices, temperature readings, and daily sales data.

Key Concepts:

 Trend: The long-term direction of data (upward, downward, or flat).

 Seasonality: Repeating patterns at regular intervals (e.g., monthly sales spikes).
 Noise: Random variations or irregularities in the data.

Formulas & Calculations:

 Moving Average: A method to smooth out fluctuations.

Where 𝑀𝐴𝑡 the moving is average at time 𝑡, and 𝑥𝑡−𝑖 is the observation at lag 𝑖.

 Exponential Smoothing: Gives more weight to recent observations.

Where 𝑆𝑡 is the smoothed value, α is the smoothing factor, and 𝑥𝑡 is the current
observation.
Example: Suppose we have monthly sales data: January: 100, February: 120, March:
130. Using a 3-month moving average:
2. Connections and Correlations

Covariance:

Covariance measures how two variables move together. If the covariance is positive, both
variables tend to increase together; if it's negative, one increases while the other decreases.

Key Concepts:

 Covariance: Measures how two variables change together.

Where:

 X and Y = variables
 𝑋̅ and 𝑌̅ = means of X and Y
 n = number of data points

If the covariance is positive, the variables tend to increase together.

Example:

Consider height and weight data for 5 people:

First, find the means 𝑋̅ =180 and 𝑌̅ =70. Now compute the covariance:

A positive covariance suggests that as height increases, weight also increases.

 Correlation: Standardized measure of the strength and direction of a relationship.

Correlation is a standardized version of covariance, measuring the strength and

direction of the linear relationship between two variables. It ranges from -1 to 1,
where:

o 1 = perfect positive correlation,

o -1 = perfect negative correlation,
o 0 = no correlation.

Where:

 σX and σY are the standard deviations of X and Y.

Example:

From the earlier covariance example, if σX=14.1 and σY=12.5, the correlation
would be:

This indicates a moderately strong positive correlation between height and

weight.
3. Scatterplot Maps
Scatterplots help visualize the relationship between two continuous variables. Each point on the
plot represents a pair of values for those two variables.

Trendline (Line of Best Fit):

A trendline is added to scatterplots to summarize the direction of the relationship between

variables. If the points are tightly clustered around the line, it suggests a strong relationship.

Equation of a Straight Line:

y=mx+c

Where:

 m = slope of the line (rate of change of y with respect to x)

 c = intercept (the value of y when x=0)

Calculation of Slope:

The slope is calculated as:

Example:

Let’s use the height and weight data again. After plotting the points, we can calculate the
trendline:
The trendline has a slope of 0.36, indicating that for each unit increase in height, weight
increases by 0.36.

4. Trees, Hierarchies, and Recursion

Trees:

A tree is a hierarchical structure where each node has a parent (except the root) and may have
children. It’s used in many areas of computer science, including data structures (binary trees),
decision making (decision trees), and database indexing.

 Nodes: Represent individual entities.

 Edges: Connect nodes and define relationships between them.

Hierarchies:

Hierarchies represent a parent-child relationship. Organizational charts, file systems, and

taxonomies are examples of hierarchies.

Recursion:

Recursion is a technique where a function calls itself to break down a problem into smaller
subproblems.

Example:

Factorial calculation:

This problem can be broken down using recursion, with 4!=4×3!

5. Networks and Graphs

A graph is a collection of nodes (vertices) and edges (connections) used to model pairwise
relations. Graphs can represent various structures like social networks, roads, and the internet.

Types of Graphs:

 Directed Graph: Edges have a direction (e.g., Twitter followers).

 Undirected Graph: Edges have no direction (e.g., Facebook friends).
 Weighted Graph: Edges have weights to indicate the strength of relationships (e.g., road
distances).

Degree Centrality:

Degree centrality measures the importance of a node based on how many connections it has.

Formula: For an undirected graph:

𝐶𝐷 (v)=deg(v)

Where deg(v) is the number of edges connected to node v.

6. Acquiring Data

Data acquisition is the process of gathering data from various sources like sensors, databases,
APIs, or web scraping. Proper data acquisition is essential for ensuring quality and relevance.

Example:

To acquire stock price data, you can use APIs like Alpha Vantage or Yahoo Finance, which
allow you to retrieve real-time stock prices programmatically.
7. Parsing Data

Parsing involves processing raw data and converting it into a usable format. This can include
reading text, cleaning data, or extracting relevant parts of a dataset.

Example:

Web scraping using Python’s BeautifulSoup library can parse HTML pages and extract data
from specific tags.

Data Visulization and Power Bi Lab Manual
No ratings yet
Data Visulization and Power Bi Lab Manual
42 pages
Comptia Data+ Da0-001
No ratings yet
Comptia Data+ Da0-001
10 pages
AzSPU SSoW Permit To Work Procedure
100% (1)
AzSPU SSoW Permit To Work Procedure
27 pages
InPower Familiarization
100% (1)
InPower Familiarization
99 pages
Data Wrangling, Also Known As Data Munging, Is An Iterative Process That Involves Data
No ratings yet
Data Wrangling, Also Known As Data Munging, Is An Iterative Process That Involves Data
9 pages
EDA
100% (1)
EDA
9 pages
Third Party Assurance Questionnaire
No ratings yet
Third Party Assurance Questionnaire
38 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Data Analytics Syllabus PDF
No ratings yet
Data Analytics Syllabus PDF
5 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
34 pages
Data Analytics Key Notes
No ratings yet
Data Analytics Key Notes
5 pages
Metodología para El Análisis de Datos
No ratings yet
Metodología para El Análisis de Datos
10 pages
ADS IA 1 Syllabus Prep
No ratings yet
ADS IA 1 Syllabus Prep
5 pages
10 Basic Analysis Techniques
No ratings yet
10 Basic Analysis Techniques
9 pages
UNIT4
No ratings yet
UNIT4
8 pages
Data Science Tools Final
No ratings yet
Data Science Tools Final
11 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Statistics - Data Trends
No ratings yet
Statistics - Data Trends
3 pages
DWM - Exp 1
No ratings yet
DWM - Exp 1
11 pages
Big Data
No ratings yet
Big Data
4 pages
KJWDH
No ratings yet
KJWDH
4 pages
Data Analytics: The Process of Inspecting, Celaning & Transforming, Modeling Data For Business Decision Making
No ratings yet
Data Analytics: The Process of Inspecting, Celaning & Transforming, Modeling Data For Business Decision Making
21 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Unit 3
No ratings yet
Unit 3
20 pages
Data Analysis For Grade 5 Elementary
No ratings yet
Data Analysis For Grade 5 Elementary
24 pages
Math10 Q2W1 2 OHSP
No ratings yet
Math10 Q2W1 2 OHSP
18 pages
Data Analytic Process
No ratings yet
Data Analytic Process
3 pages
Unit 3-BA
No ratings yet
Unit 3-BA
31 pages
Bi Tools - Comparative Study
No ratings yet
Bi Tools - Comparative Study
14 pages
Dev Core
No ratings yet
Dev Core
7 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Chapter 5
No ratings yet
Chapter 5
23 pages
Ia - Eda
No ratings yet
Ia - Eda
10 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
Notes DV 2025
No ratings yet
Notes DV 2025
10 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
MULTIVARIATE ANALYSIS Part 1
No ratings yet
MULTIVARIATE ANALYSIS Part 1
30 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Data Analytics12202040501032
No ratings yet
Data Analytics12202040501032
22 pages
Term2 Datascience Notes
No ratings yet
Term2 Datascience Notes
8 pages
Simple RoadMap To Start Your Career As Data Analyst
No ratings yet
Simple RoadMap To Start Your Career As Data Analyst
21 pages
DSBD
No ratings yet
DSBD
23 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
DM Unit2
No ratings yet
DM Unit2
9 pages
22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
Document
No ratings yet
Document
29 pages
Eds Unit 3
No ratings yet
Eds Unit 3
22 pages
DV Lab Manual (Ex - No.1-10)
No ratings yet
DV Lab Manual (Ex - No.1-10)
23 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Data Analysis Is The Process of Gathering
No ratings yet
Data Analysis Is The Process of Gathering
5 pages
Down 2
No ratings yet
Down 2
61 pages
All Unit DV Notes
No ratings yet
All Unit DV Notes
31 pages
Data Science Course Agenda
No ratings yet
Data Science Course Agenda
29 pages
Unit 4
No ratings yet
Unit 4
33 pages
1 Da
No ratings yet
1 Da
44 pages
Step by Step Data Wrangling
No ratings yet
Step by Step Data Wrangling
4 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Data Visualization - R Programming Power Bi
No ratings yet
Data Visualization - R Programming Power Bi
51 pages
A Project Study On "Talent Management in Dainik Bhaskar"
No ratings yet
A Project Study On "Talent Management in Dainik Bhaskar"
18 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Ba Theory
No ratings yet
Ba Theory
10 pages
Data Analysis and Visualization Uploaded 1744086898389
No ratings yet
Data Analysis and Visualization Uploaded 1744086898389
47 pages
PRENSA - MANUAL - DEPR603704000632103 - en - PT
100% (1)
PRENSA - MANUAL - DEPR603704000632103 - en - PT
517 pages
EN DX140W-7 DX160W-7 Brochure D4600863 04-2023
No ratings yet
EN DX140W-7 DX160W-7 Brochure D4600863 04-2023
28 pages
Data Exploration and Visualization Unit 2
100% (1)
Data Exploration and Visualization Unit 2
19 pages
C79000-G8976-1466 ROS v5.5 RSG909R ConfigurationManual
No ratings yet
C79000-G8976-1466 ROS v5.5 RSG909R ConfigurationManual
308 pages
SF101 Manual Programming
No ratings yet
SF101 Manual Programming
3 pages
C++ Programming Task
No ratings yet
C++ Programming Task
13 pages
Accupyc II For Helium Density
No ratings yet
Accupyc II For Helium Density
254 pages
Aesthetic Wallpapers - Google Search
No ratings yet
Aesthetic Wallpapers - Google Search
1 page
Customer Returns With Quality Inspection in SAP EWM
No ratings yet
Customer Returns With Quality Inspection in SAP EWM
2 pages
Checklist Sachin Hissaria 1697820912
No ratings yet
Checklist Sachin Hissaria 1697820912
14 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
Agile Suitability Tool Handout 18102020 093347am 2 PDF
No ratings yet
Agile Suitability Tool Handout 18102020 093347am 2 PDF
14 pages
BMW Case
No ratings yet
BMW Case
2 pages
Inspection Checklist
No ratings yet
Inspection Checklist
11 pages
Unit 5
No ratings yet
Unit 5
30 pages
Unit 4
No ratings yet
Unit 4
57 pages
Network License Plate Camera Manual
No ratings yet
Network License Plate Camera Manual
24 pages
Continuous & Continued Process Verification: Presented by Eoin Hanley 4 July, 2016
No ratings yet
Continuous & Continued Process Verification: Presented by Eoin Hanley 4 July, 2016
39 pages
Rithmic Trader Pro - Old
No ratings yet
Rithmic Trader Pro - Old
344 pages
Bosch Smart Home Rookmelder
No ratings yet
Bosch Smart Home Rookmelder
23 pages
Developed by Adnan Alam Khan: For BS Students
No ratings yet
Developed by Adnan Alam Khan: For BS Students
26 pages
OTA Project (A1-G4)
No ratings yet
OTA Project (A1-G4)
12 pages
Synopsis of Final Project Credit Appraisal Procedure of Canara Bank
No ratings yet
Synopsis of Final Project Credit Appraisal Procedure of Canara Bank
4 pages
Simple Methods To Fix Err - Name - Not - Resolved
No ratings yet
Simple Methods To Fix Err - Name - Not - Resolved
2 pages
Program Educational Objectives (Peos) : Annasaheb Dange College of Engineering and Technology
No ratings yet
Program Educational Objectives (Peos) : Annasaheb Dange College of Engineering and Technology
1 page
Thirteen Questions A Business Process Map Should Answer
No ratings yet
Thirteen Questions A Business Process Map Should Answer
4 pages
TestReach Corfe Co 21
No ratings yet
TestReach Corfe Co 21
1 page
Semester Course Status Title Score Grade Point Grade
No ratings yet
Semester Course Status Title Score Grade Point Grade
2 pages
Stratmanage 10 Act 01
No ratings yet
Stratmanage 10 Act 01
1 page
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet

Data Exploration and Visualization Unit 3

Uploaded by

Data Exploration and Visualization Unit 3

Uploaded by

Unit 3: Introduction to Data Visualization

1. The Seven Stages of Visualizing Data

1.1 Define Objectives

1.2 Data Collection

 Description: Collecting relevant data is the foundation of effective visualization. This

1.3 Data Cleaning

 Description: Raw data often contains inaccuracies, missing values, or irrelevant

1.4 Data Transformation

 Description: Data transformation involves converting data into a suitable format or

1.5 Data Analysis

1.6 Data Visualization

1.7 Interpretation and Presentation

2.1 Tools for Data Processing

df.dropna(inplace=True) # Removes missing values

cleaned_data<- sales_data %>%

filter(!is.na(sales)) # Filters out rows with NA in sales

 SQL: For querying databases and extracting data directly.

2.2 Basic Data Operations

 Aggregation: Summarizing data to derive insights. Common functions include:

filtered_df = df[df['revenue'] > 1000]

 Transformation: Modifying data, such as scaling numerical values or creating new

df['profit_margin'] = df['profit'] / df['revenue']

3.1 Types of Maps

 Libraries and Tools:

m = folium.Map(location=[45.5236, -122.6750], zoom_start=13)

folium.Marker([45.5236, -122.6750], popup='Portland').add_to(m)

1. Time Series Analysis

 Trend: The long-term direction of data (upward, downward, or flat).

Formulas & Calculations:

 Moving Average: A method to smooth out fluctuations.

 Exponential Smoothing: Gives more weight to recent observations.

 Covariance: Measures how two variables change together.

If the covariance is positive, the variables tend to increase together.

Consider height and weight data for 5 people:

A positive covariance suggests that as height increases, weight also increases.

 Correlation: Standardized measure of the strength and direction of a relationship.

Correlation is a standardized version of covariance, measuring the strength and

o 1 = perfect positive correlation,

 σX and σY are the standard deviations of X and Y.

This indicates a moderately strong positive correlation between height and

Trendline (Line of Best Fit):

A trendline is added to scatterplots to summarize the direction of the relationship between

Equation of a Straight Line:

 m = slope of the line (rate of change of y with respect to x)

The slope is calculated as:

4. Trees, Hierarchies, and Recursion

 Nodes: Represent individual entities.

Hierarchies represent a parent-child relationship. Organizational charts, file systems, and

This problem can be broken down using recursion, with 4!=4×3!

 Directed Graph: Edges have a direction (e.g., Twitter followers).

Formula: For an undirected graph:

Where deg(v) is the number of edges connected to node v.

You might also like