0% found this document useful (0 votes)
27 views

BI - Visualization and Data Governance

This document discusses data visualization and its importance. It defines visualization as using tables, images and diagrams to understand data. Visualization is effective for presenting vast amounts of information and enabling complex analysis. It also allows for fast comprehension of trends and patterns. The document then discusses different levels of information comprehension, using context and numbers to highlight important parts, and considerations for visualization flow and dealing with large datasets.

Uploaded by

Sarasi Yashodha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

BI - Visualization and Data Governance

This document discusses data visualization and its importance. It defines visualization as using tables, images and diagrams to understand data. Visualization is effective for presenting vast amounts of information and enabling complex analysis. It also allows for fast comprehension of trends and patterns. The document then discusses different levels of information comprehension, using context and numbers to highlight important parts, and considerations for visualization flow and dealing with large datasets.

Uploaded by

Sarasi Yashodha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

18/05/2023

How many “V”?


XCRSVDFXESDRHSVASEQZRHAWFAVOISVB
AQETHGADGEAXVRGTYSHKUAQETKAWRNA
VWRGHAFRDGHLQECYOJDETUPLACBWNCV
AQETYJKAFZSCXBZNMA

Data Visualization
Supunmali Ahangama

1 2

How many “V”? Why Visualization?


XCRSVDFXESDRHSVASEQZRHAWFAVOISVB
AQETHGADGEAXVRGTYSHKUAQETKAWRNA
VWRGHAFRDGHLQECYOJDETUPLACBWNCV
AQETYJKAFZSCXBZNMA

3 4

Why Visualization?
Human
Analyst: vi·su·al·i·za·tion
pattern
detection 1. Formation of mental
remembers visual images
context 2. The act or process of
fantastic interpreting in visual
intuition terms or of putting into
can predict visible form

-The American Heritage Dictionary

Image is the meaningful visual form a human


can understand with the least effort and in the
shortest time.

5 6

1
18/05/2023

Visualization to… Visualization


Visualization is the technique used to create tables,
images, diagrams and other intuitive display ways to
understand data.

Visualization has proven effective for not only


presenting essential information in vast amounts of
data but also driving complex analyses.

It is not just a convenient feature, rather it’s


a must for Big Data.

7 8

Visualization
What is the 4th V of Big Data?
Value
Discover the hidden trends and patterns
Fast comprehension (quicker than spreadsheets or
analyzing numerical tables)
Impossible to handle as data volumes grow exponentially
Tools are interactive, allowing the users adjust the
analysis.

https://www.datapine.com/blog/business-intelligence-trends/

9 10

Most important trends in BI


BARC survey on
the most
important
trends in BI for
2018 shows that
data discovery/
visualization is
among top-3 on
the list together
with master
data/data
quality
management
and self-service
BI.
5/18/2023 IS22225 - BI 11

11 12

2
18/05/2023

3 Levels of Information
When Choosing Visualization
Comprehension
Approach?
Objectives
Type of data
Audience

The elementary level allows comprehend the information at any given point in time
The intermediate level helps analyse some period
The overall level enables comprehension of the entire picture over the whole period

13 14

Show Context Add Context

42 is just a number and means nothing


without context.

15 16

Use Numbers To Highlight Most


Traffic flow analysis with Context Important Parts of Data

17 18

3
18/05/2023

Aesthetic matters… More aesthetically pleasing…

Boring!!!

19 20

More aesthetically pleasing… Visualization Flow

21 22

Problem Space
The size of the dataset is part of the On one PC, it
Run out of screen to draw each data point
problem! Takes a long time to look at every data point
May not be able to store all the data points

23 24

4
18/05/2023

Data Processing Constraints One-pass algorithms


Solutions: Touch each data point once
Streaming algorithm (input is presented as a sequence of items)
Sample (or Stream)
Divide & Conquer Given any list as an input:
Restrict Data Count the number of elements.
Find the nth element from start or end.
Index Given a list of numbers:
(OLAP, InMems, Nanocubes) Find the k largest or smallest elements, k given in advance.
Find the sum, mean, variance and standard deviation of the elements of the list.

25 26

Divide and conquer


for exploration Data Communication
K-means clustering
algorithm for Solution:
processing large data Aggregate
different partitions
One visual point represents multiple data points
are processed locally
Sample
Show only some of the dataset
You’ll never know it all

27 29

Design Constraints Use cases (2 dimensional Area)


2D Area is not suitable if there is no geospatial connection.
When the Y axis goes all the way to very high values, it’s still Area or distance cartograms
very interesting to know which values are possible A distance cartogram is a
diagram that visualizes the
Streaming means your world can change proximity indices between
Categorical -> too many categories! points in a network, such
as time–
Numerical -> changing bounds distances between cities.
Any color map or scale can change

Indicates some additional parameters like


demography, population size, travel times, etc.

30 31

5
18/05/2023

Area Cartogram Area Cartogram

Mosaic cartogram showing the distribution of the global population.


Each of the 15,266 pixels represents the home country of 500,000
people – cartogram by Max Roser for Our World in Data
5/18/2023 IS22225 - BI 32 5/18/2023 IS22225 - BI 33

32 33

Use cases (2 dimensional Area) Use cases (2 dimensional Area)


Symbol Maps
Choropleth This is a map with symbols in different sizes on them.
A map colored with
different colors The size of symbols are used for comparison.
depending on the
level of the
examined variable,
like the sales level
per state or the
biggest inventory
stocks per state.

34 35

World Bubble Map Use cases (2 dimensional Area)


Dot distribution map
Use dots to highlight
the level of presence
of the examined
variable within the
area.

5/18/2023 IS22225 - BI 36

36 37

6
18/05/2023

Use Cases (Multidimensional data Stacked Bar chart

visualizations)
Pie chart
Histogram
Scatter plot

Not suitable when comparing two or more different sets.

38 39

Most spoken
language?
Hierarchical\Tree Data Visualization

Where are the


rich?

40 41

Representation of a tree
Tree Data
Why Hierarchy?
To manage large and complex systems
Interrelated subsystems inside a hierarchy
Semi independent
Only outcomes matter (what they do) not the process (not how they
do)

42 43

7
18/05/2023

Major challenges Major challenges


Limited screen What do people need to know about a hierarchy?
space for limited The location of a node
screen Its surrounding
Its higher or lower levels
Its attributes

Visualizing the Tree of Life (yifanhu.net)

44 45

Classification of tree visualization


Visualization design requirements methods
Show nodes and links clearly Node links : More clarity
How to use space efficiently? Space- filling : space efficiency
Support user interaction with any node/link Hybrid : node link and space filling
What tasks to support? Adjacency diagrams
Global context and local details
Attributes of nodes and links
Comparing nodes, links and subtrees

46 47

Node link
Layout of nodes
Orthogonal layout
Nodes lined up
Indent
Dendrogram
Easy to draw and read
Difficult to scale up and keep the context

48 49

8
18/05/2023

50 51

Layout of nodes
Radial layout
Nodes arranged circularly
Simple radial layout
Hyperbolic layout
More efficient space use
Hard to compare and trace tree levels

52 53

Space filling method


Not only hierarchy, but also a specific note attribute can be
indicated.
E.g. file size in a file directory, employee salary in a firm

54 55

9
18/05/2023

Hybrid methods
Combine node link and space filling
Adjacency diagrams
Natural extension of node link method by including size information
in nodes

56 57

Other models 3D Tree

58 59

Network Visualization Network data (Graph)


A very hot topic – social network analysis, knowledge graphs
But it is not a new area

60 61

10
18/05/2023

Compared with hierarchical data Fake News

Relationship is more complicated


Between any two nodes
With direction
With weight
But same or very similar goals in visualization
Showing nodes and links clearly
Support user interaction based on needs
About the location of the nodes

Zhao, Z., Zhao, J., Sano, Y., Levy, O., Takayasu, H., Takayasu, M., Li, D., & Havlin, S. (2020). Fake
news propagates differently from real news even at early stages of spreading. EPJ Data Science, 9,
1-14.

62 63

Use Cases (Network data models)


Use Cases (Network data models) Shows the changes in the data structure over time
or under certain conditions.
Matrix diagram or chart Alluvial diagram

A Matrix Diagram is a table


that allows sets of data to be
compared in order to make
better decisions. It displays
the existence and strength of
relationship between pairs of
items of two or more sets.
The relationship is then
indicated by a number or
symbol in each cell where the
two items intersect in the
matrix.

64 65

Use Cases (Temporal visualizations)


Connected Scatter Plot
Time Series

66 67

11
18/05/2023

Polar area diagram


Use Cases (Word cloud - Word
Use Cases Count)
A sharp sector
stretched far away
from the center might
be more important
than a blunt sector
that does not reach
far.

Word Count - Indicates how often a


word is used.
China vs US comparison

68 69

Word Tree NanoCube 6.2M reported crimes


atasets are running off of
a single machine with
https://nanocubes.net/ 16GB of RAM

70 71

Hybrid-reality environments Challenges


Allows scalable visualization of heterogeneous datasets. Availability of visualization specialists – to select the best
These environments synergize the capabilities of VR and high- data sets and visualization styles to ensure the data is
resolution tiled LCD walls, letting users juxtapose 2D and 3D exploited to the maximum.
datasets and create hybrid 2D 3D information spaces. Visualization hardware resources – Visualization is a
E.g. Cyber-Commons and CAVE2 computing task
Data quality (inaccurate or out of date data)

72 73

12
18/05/2023

Big Data Visualization Tools


Google charts
QlikSense and QlikView
IBM Cognos Analytics
Tableau Desktop
Data Governance
Microsoft PowerBI Supunmali Ahangama
Oracle Visual Analyzer
SAS Visual Analytics
SAP Lumira

74 75

Why Data Matters!


Outline
FACT - Data is the most
• Introduction valuable asset in an
organisation after its
• Primary goals
people
• Data Governance Framework
FACT - Data is critical to
• Challenges the running of busines s
• Best Practices functions and proces s es
FACT- Without constant
vigilance and effort to
maintain order data
entropy or anarchy reins ! Source: sciphilos.info

5/18/2023 IS22225 - BI 76

76 77

So How Do You Get Started?


The Negative Impact of Poor Data Management
• Make data a business priority, not an IT function
Economic:
• Undertake a data audit
Revenues, Costs, Profits
• Implement a data strategy – embrace both improvement and
exploitation
• Prepare and enforce a data policy to control access and usage
Brand & Reputation rules
Customer Loyalty • Monitor, measure and control key datasets - reference
(or core) data and master data
• Create and run data and process enhancement projects
• Implement a system of Data Governance
Law & Regulation

78 79

13
18/05/2023

Data Governance
What is Data Governance?
Data governance in BI refers to the process of managing and
Definition:
controlling an organization's data assets to ensure their
quality, security, compliance, and usability.
A process for managing
It involves establishing a set of policies, procedures, and
and improving data for the
guidelines to govern the collection, storage, transformation,
benefit of all stakeholders
and usage of data within the BI environment.

5/18/2023 IS22225 - BI 81

80 81

Key terms Data Governance: What it is?


• Data Policies ➢ An active initiative within the state.
• It outline rules and guidelines for data management. ➢ A cross-organizational framework for:
• There are policies for data classification, data retention, and data privacy
policies. ▪ Securely sharing data.
▪ Data analysis across divisions.
• Data Stewardship ▪ Stakeholder collaboration.
• It is about assigning responsibilities for data management to individuals or
teams.
▪ Improving data quality.
• They would be maintaining data integrity, resolving data-related issues, and ➢ An on-going process.
ensuring compliance with data policies.
➢ The mechanism for controlling and trusting data.

5/18/2023 IS22225 - BI 82

82 83

Why Do We Need Data Governance?


Data Governance: What it isn’t? Enables
decision
making

➢ Technology ➢ Data is an Ensures


Reduces
operational
▪ Data cleansing or ETL. asset that transparency
friction

▪ Data warehousing. MUST be


▪ Database design. managed
➢ Technology or project oversight. from an Data Governance
agency-wide Reduces cost Protects the
➢ A silo-ed initiative. perspective.
and increases
effectiveness
needs of the
stakeholders

➢ A project or project management.


Educates
Builds management
standards and and staff to
processes adapt common
approaches

84 85

14
18/05/2023

Data Governance Goals Data Governance Goals


Data Quality: Ensuring that data is accurate, complete, and Compliance: Ensuring that data usage and management
reliable for effective decision-making. This involves defining practices align with regulatory requirements and industry
data quality standards, conducting data profiling and standards. This includes establishing data privacy policies,
cleansing activities, and implementing data validation managing consent and permissions, and adhering to data
processes. governance frameworks such as GDPR or HIPAA.
Data Security: Protecting data from unauthorized access, Usability: Making data easily accessible and understandable
breaches, or misuse. Data governance includes defining data for users across the organization. Data governance includes
security policies, implementing access controls, and ensuring defining data definitions and metadata, establishing data
compliance with data protection regulations. integration and transformation processes, and providing data
documentation and lineage information.
5/18/2023 IS22225 - BI 86 5/18/2023 IS22225 - BI 87

86 87

Data Governance Goals Data Governance Framework


Data Consistency: Promoting consistency and standardization • Data Governance Council/Committee: Responsible for establishing data
of data across different systems, departments, and business governance strategy and policies.
units. This involves establishing data governance committees • Data Stewards: Assigned to specific data domains or business units, responsible
or councils to define and enforce data standards, resolving for data management and enforcement of data policies.
data conflicts, and ensuring data integrity. • Data Standards: Define the guidelines for data quality, data integration, and data
security.
• Data Processes: Documented procedures for data collection, data storage, data
transformation, and data access.
• Data Tools and Technologies: Include data governance tools that support data
profiling, metadata management, and data lineage tracking.

5/18/2023 IS22225 - BI 88 5/18/2023 IS22225 - BI 89

88 89

Good data steward will focus on Data Governance Tools


• Creating clear and unambiguous definitions of data Collibra
• Defining a range of acceptable values, such as data types Informatica Enterprise Data Catalog
and length IBM InfoSphere Information Governance Catalog
• Monitoring data quality and starting root cause Apache Atlas (https://atlas.apache.org/#/)
investigation when problems arise.
• Understanding the usage of data in the business units.
• Reporting metrics and issues to the data governance
council.

5/18/2023 IS22225 - BI 90 5/18/2023 IS22225 - BI 91

90 91

15
18/05/2023

Benefits for BI Initiatives


Improved Data Quality: Ensures accuracy, completeness, and consistency
of data.
Enhanced Data Security: Protects sensitive data from unauthorized
access or breaches.
Regulatory Compliance: Helps organizations comply with data
protection regulations and industry standards.
Increased Trust in Data: Builds confidence in data used for reporting and
decision-making.
Better Data Collaboration: Facilitates data sharing and collaboration
across departments and teams.

IS22225 - BI 5/18/2023 IS22225 - BI 93

92 93

The Data Silo Problem


Data Management Plan

http://www.oceandatapractices.net/
Sales Dispatch bitstream/handle/11329/275/IOC%20
Manual_Guides_73.pdf
Operations Finance

94 95

The Data ‘Centric’ Solution

CUSTOMER DATA

PRODUCT DATA
Sales
Operations
Dispatch
Finance
Treat Data as
Infrastructure
FINANCE DATA

EMPLOYEE DATA

96 97 Treat Data as Infrastructure!

16
18/05/2023

Data Lifecycle
Principle of Data Sharing
Plan Create Agree what data to share and how to do
it!
Archive Ingest

Use Store

98 99

Data in Context Wisdom? ? ? Challenges


• Lack of Data Awareness: Many employees may not understand the
importance of data governance or their roles and responsibilities.
• Cultural Resistance: Resistance to change or lack of buy-in from
stakeholders.
• Resource Constraints: Insufficient budget, staff, or tools to implement a
robust data governance program.
• Complex Data Ecosystems: Managing data across multiple systems,
platforms, and sources can be challenging.
• Balancing Security and Usability: Striking the right balance between
data security and data accessibility.
Adapted from the well known Data-Information-Knowledge Triangle by OceanWise (2011)
5/18/2023 IS22225 - BI 101

100 101

Best Practices for Data Governance Question 1


• Establish Clear Goals: Define the objectives and expected outcomes of data The purpose of good data governance is:
governance initiatives.
• Executive Sponsorship: Obtain support from senior leadership to ensure
A. to meet regulatory requirements
commitment and allocate necessary resources. B. to prevent any use of organizational data without
• Communication and Training: Educate employees about the importance of data permission
governance and provide training on data management practices.
C. to ensure that organizational data can be used effectively
• Start Small, Scale Up: Begin with a pilot project or a focused area to demonstrate
the value of data governance before expanding to the entire organization. for analysis and decision making
• Continuous Monitoring and Improvement: Regularly assess and refine data D. to assert control of organizational priorities
governance processes and policies to adapt to evolving business needs.

5/18/2023 IS22225 - BI 102 5/18/2023 IS22225 - BI 103

102 103

17
18/05/2023

Question 1 Summary
The purpose of good data governance is:
A. to meet regulatory requirements
• Data governance is a critical discipline that enables organizations to
B. to prevent any use of organizational data without
permission effectively manage their data assets, ensuring data quality, security,

C. to ensure that organizational data can be used effectively compliance, and usability.
for analysis and decision making
• By implementing robust data governance practices, organizations can
D. to assert control of organizational priorities
unlock the true value of their data and drive informed decision-
making across the organization.

5/18/2023 IS22225 - BI 104 5/18/2023 IS22225 - BI 107

104 107

Types of Analytics
Descriptive Analytics
• Descriptive Use data aggregation techniques to provide insight into the
• Predictive past and answer: “What has happened?”
• Prescriptive E.g. total stock in inventory, average dollars spent per
customer and annual change in sales.

18/5/2023 108 18/5/2023 109

108 109

Predictive Analytics Prescriptive Analytics


• Use statistical models and forecasts techniques to • Use optimization and simulation algorithms to
understand the future and answer: “What could happen?”
advice on possible outcomes and answer: “What
• E.g. Forecast total stock in inventory, predict credit frauds,
forecast daily patient admission. should we do?”
• Note: • Quantify the effect of future decisions in order to
– Classification: The decision tree is a classification model, advise on possible outcomes before the decisions are
applied to existing data. If you apply it to new data, for which actually made
the class is unknown, you also get a prediction of the class.
– Sentiment analysis: Predicting data that we don't have, which • E.g. optimize production, scheduling and inventory in
is the sentiment label, whether it's a positive or negative the supply chain to make sure that are delivering the
sentiment right products at the right time
18/5/2023 110 18/5/2023 111

110 111

18
18/05/2023

Thank you

112

19

You might also like