0% found this document useful (0 votes)
90 views

Lecture 2 Data Governance

The document discusses objectives and principles of data governance. It aims to establish consistent data quality, improve data integrity, control data access, and address data security and retention. It describes data as a valuable corporate asset and outlines dangers of ungoverned data such as lost revenue and non-compliance. The document also discusses elements of data quality including accuracy, completeness, and validity. It provides several principles for effective data cleaning such as planning, organization, prevention, and prioritization.

Uploaded by

Mecheal Thomas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Lecture 2 Data Governance

The document discusses objectives and principles of data governance. It aims to establish consistent data quality, improve data integrity, control data access, and address data security and retention. It describes data as a valuable corporate asset and outlines dangers of ungoverned data such as lost revenue and non-compliance. The document also discusses elements of data quality including accuracy, completeness, and validity. It provides several principles for effective data cleaning such as planning, organization, prevention, and prioritization.

Uploaded by

Mecheal Thomas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

DATA GOVERNANCE

OBJECTIVES
 Data Governance Principles
 Data as an Asset
 Define Data Governance
 Assess Data Quality Dimensions
 Describe Data Cleaning Principles
 Discuss Security Threats to an Organization’s data
DATA IS A CORPORATE - ASSET
 Data is getting bigger, faster, in more shapes and formats, from
more sources and is more complex to control
 Data is more important for business, both for operational and
analytical purposes
 Data Should be accepted as an enterprise asset

• Data Quality should be part of everyone’s job description


• Data Quality should be a parameter of performance evaluations
and incentive packages
 Employees should be assigned responsibility for data

 Data should be modeled like other assets


DATA GOVERNANCE
 Data Governance is the execution and enforcement of
authority over the definition, production & usage of data
and data-related resources.
 The aim is to establish consistent data quality, improve
data integrity, control data access and address data
security and retention
REALITY OF DATA GOVERNANCE
 Data Governance programs take a lot of time, money
and effort to get going. Very often companies fail
several times and keep going back to basics and start
all over again. While all this is happening the rest of
the company is very frustrated waiting on Data
Governance to work.
DANGERS OF UNGOVERNED DATA
 According to Gartner’s former vice president, Andreas Bitterer
“There is not a company on the planet that does not have a
data quality problem, where a company does recognize
they have a problem, they often underestimate the size of
it.
 Dirty data is at the core of bad marketing and sales decisions.

 Dangers of Dirt Data Clip


DANGERS OF UNGOVERNED DATA
 According to The Data Warehousing Institute (TDWI),
data quality problems cost U.S. businesses more than $600
billion a year.
DANGERS OF UNGOVERNED DATA
 Lost revenue
 Wasted resources

 Decreased productivity

 Damage to credibility

 Risk of failure for marketing initiatives

 Fines due to compliance issues

 Inability to reach a prospect by email, phone or mail.


DATA QUALITY
 GiGo: garbage in, garbage out
 ‘Because it’s in the computer, don’t mean it’s right

It’s not the things you don’t know that matter,


it’s the things you know that aren’t so.

Will Rogers, Famous Okie GI specialist

“But there are also unknown unknowns: the ones we


don't know we don't know.” Donald Rumsfeld
10
DATA QUALITY
MANAGING DATA QUALITY
 Data in the real world is dirty
 Incomplete: lacking attribute values, lacking certain attributes
of interest, or containing only aggregate data
 Noisy: containing errors or outliers
 Inconsistent: containing discrepancies in codes or names
 Poor quality data, Poor quality mining results!
 Quality decisions must be based on quality data
 Required for both OLAP and Data Mining!
MANAGING DATA QUALITY
 Generally, you have a problem if the data doesn’t mean what
you think it does, or should
 Data not up to specification
 You don’t understand the specification : complexity, lack
of metadata.
 Many sources and manifestations
 Data quality problems are expensive and pervasive
 DQ problems cost hundreds of billion $$$ each year.
 Resolving data quality problems is often the biggest
effort in a data mining study.
EXAMPLE
T.Das|97336o8327|24.95|Y|-|0.0|1000
Ted J.|973-360-8779|2000|N|M|NY|1000

 Can we interpret the data?


 What do the fields mean?
 What is the key?
 What is the unit of measurement?

 Data glitches
 Typos, multiple formats, missing / default values
 Metadata and domain expertise
 Field three is Revenue. In dollars or cents?
ELEMENTS OF DATA QUALITY

 Data quality is a perception or an assessment


of data's fitness to serve its purpose in a given context. The
elements of data quality includes the following:
 Accuracy
 Completeness
 Timeliness
 Consistency
 Integrity
 Validity
ACCURACY
 Determines if data was accurately recorded. It refers to
whether the data values stored for an object are the correct
values. To be correct, a data values must be the right value
and must be represented in a consistent and unambiguous
form
COMPLETENESS
 Completeness implies having all the necessary or
appropriate parts; being entire, finished, total.
 A data set is compete to the degree that it contains
required attributes and a sufficient number of records, and
to the degree attributes are populated in accordance with
data consumer expectations.
TIMELINESS
 Timeliness is the degree to which data conforms to a
schedule for being updated and made available. For
data to be timely, it must be delivered according to
schedule.
VALIDITY
 Validity is differentiated from both accuracy, and
correctness. Validity is degree to which data conform to a
set of business rules, sometimes expressed as a standard
within a defined data domain.
CONSISTENCY
 Consistency can be thought of as the absence of variety or
change.
 Consistency is the degree to which data conform to an
equivalent set of data, produced under similar conditions
or a set produced by the same process over time.
INTEGRITY
 Integrity is degree to which data conform to data
relationship rules (as defined by the data model) that are
intended to ensure the complete, consistent, and valid
presentation of data.
 Integrity represents the internal consistency of a data set.

2
1
DATA CLEANING PRINCIPLES -1
 Planning is essential
 Develop a vision, a policy and strategy
 Total Data Quality Management Cycle

1
DATA CLEANING PRINCIPLES - 2
 Organising Data improves efficiency
 The organization of data can improve efficiency and
considerably reduce the time and costs of data
cleaning.
 For example, by sorting records by collector and
date, it is possible to spot errors where a record may
have been incorrectly recorded by date.
DATA CLEANING PRINCIPLES - 3

 Prevention is better than cure


 It is far cheaper and more efficient to prevent an error
from happening, than to have to detect it and correct it
later.
 It is also important that when errors are detected, that
a feedback mechanisms ensure that the error doesn’t
occur again during data entry, or that there is a much
lower likelihood of it re-occurring.
DATA CLEANING PRINCIPLES - 4

• Responsibility belongs to everyone (collector, custodian and user)


 The principle responsibility belongs to the data custodian
 The collector has responsibility to respond to the custodian’s
questions when the custodian finds errors or ambiguities that
may refer back to the original information supplied by the
collector..
 The user also has a key responsibility to feed back to custodians
information on any errors or omissions they may come across,
including errors in the documentation associated with the data.
DATA CLEANING PRINCIPLES - 5
 Partnerships Improve Data Efficiency
 By developing partnerships data validation processes won’t be
duplicated, errors will more likely be documented and corrected.
 Partnerships should be created with:
 Other institutions that collect data
 Like-minded institutions developing tools, standards and
software
 Data users (good feedback mechanisms)
 Statisticians and data auditors
DATA CLEANING PRINCIPLES - 6

 Prioritisation
 Prioritisation helps reduce costs and improves efficiency. It
is important to concentrate on those records that offers
the most value.
• Ignore data that are not used or for which data quality
cannot be guaranteed
 Focus on cleaning lots of data at the lowest cost.
 For example, those that can be examined using batch
processing or automated methods, before working on
the more difficult records.
DATA CLEANING PRINCIPLES -7

 Set targets and performance measures


 Performance measures are a valuable addition to quality
control procedures. It help an organization manage their
data cleaning processes.
 Performance measures may include statistical checks on
the data (for example on the level of quality– 65% of all
records have been checked by a qualified taxonomist
within the previous 5 years)
DATA CLEANING PRINCIPLES - 8

 Education and training improves techniques


 Poor training, especially at the data collection and data
entry stages of the Information Quality Chain, is the
cause of a large proportion of the errors

 Good training of data entry operators can reduce the


error associated with data entry considerably, reduce data
entry costs and improve overall data quality.
DATA CLEANING PRINCIPLES - 9

 Accountability, Transparency and Audit-ability are important


 Haphazard and unplanned data cleaning exercises are very
inefficient and generally unproductive.
 Data cleaning processes need to be transparent and well
documented with a good audit trail to reduce duplication and to
ensure that once corrected, errors never re-occur.
 Within data quality policies and strategies – clear lines of
accountability for data cleaning need to be established.
DATA CLEANING PRINCIPLES - 10

• Documentation is the key to good data quality


 Without good documentation, it is difficult for users to
determine the fitness for use of the data and difficult for
custodians to know what and by whom data quality checks
have been carried out.
Data Security
 Data security refers to protective digital privacy measures that
are applied to prevent unauthorized access to computers,
databases and websites. Data security also protects data from
corruption.
Why is Data Vulnerable?
• Hardware problems
• Breakdowns, configuration errors, damage from improper
use or crime
• Software problems
• Programming errors, installation errors, unauthorized
changes
• Disasters
• Power failures, flood, fires, and so on
• Use of networks and computers outside of firm’s control
• E.g., with domestic or offshore outsourcing vendors 33
Data Security Risks
• Hackers versus crackers
• Why hack?
 Harassment
 Show-off
 Gain access to computer services without paying
 Obtain information to sell

34
Methods Used by Hackers
 Malware
• Viruses
• Rogue software program that attaches itself to other
software programs or data files in order to be executed
• Worms
• Independent computer programs that copy themselves
from one computer to other computers over a network
• Trojan horses
• Software program that appears to be benign but then
does something other than expected.
35
Methods Used by Hackers
 Spyware
• Small programs install themselves surreptitiously on
computers to monitor user Web surfing activity and
serve up advertising
 Key loggers
• Record every keystroke on computer to steal serial
numbers, passwords, launch Internet attacks

36
Methods used by Hackers
 Sniffer
• Eavesdropping program that monitors information
traveling over network
• Enables hackers to steal proprietary information such as
e-mail, company files, and so on
 Denial-of-service attacks (DoS)
• Flooding server with thousands of false requests to crash
the network
 Distributed denial-of-service attacks (DDoS)
37
• Use of numerous computers to launch a DoS
Methods used by Hackers
 Identity theft
• Theft of personal information (social security ID, driver’s
license, or credit card numbers) to impersonate someone
else
 Phishing
• Setting up fake Web sites or sending e-mail messages that
look like legitimate businesses to ask users for confidential
personal data
 Evil twins
• Wireless networks that pretend to offer trustworthy Wi-Fi 38
connections to the Internet
Internal Threats: Employees

 Security threats often originate inside an organization


• Inside knowledge
• Sloppy security procedures
• User lack of knowledge
• Social engineering: Tricking employees into revealing their
passwords by pretending to be legitimate members of the
company in need of information

3
9
Mitigating Security Risks
 Intrusion detection systems:
• Monitor hot spots on corporate networks to detect and
deter intruders
• Examine events as they are happening to discover attacks
in progress
 Antivirus and antispyware software:
• Check computers for presence of malware and can often
eliminate it as well
• Require continual updating
 Firewall: Combination of hardware and software that
prevents unauthorized access to network 40
Mitigating Security Risks
A Corporate Firewall

Figure 7-5

The firewall is
placed between
the firm’s
private
network and
the public
Internet or
another
distrusted
network to
protect against 41
unauthorized
traffic.
MITIGATING INFORMATION SECURITY RISK

 Physical mitigating Procedures


 Locked servers
 Removable hard drives that are locked when not in use
 Hard disk drives requiring special tools for detachment
 Physical cages around computers that prohibit access
 Authentication Procedures
 Password systems
 Smart cards
 Biometric authentication: Fingerprints, irises, voices

Hacking the US Power Grid


BENEFITS OF IMPLEMENTING DATA GOVERNANCE
 People understand their relationship to data and the impact
they have.
 People understand the rules associated with definition,
production & usage of data.
 People are held formally accountable for their actions with
data.
 People know when they need to be involved in data-related
processes.
 People are communicated with depending on their
relationship to data.
DATA GOVERNANCE CHALLENGES

 Cultural barriers
 Lack of senior-level sponsorship
 Underestimating the amount of work involved
 Too much time spent on structure and policies but not
enough on action
 Lack of business commitment
 Lack of understanding that business definitions vary
 Trying to move very fast from no-data-governance to
enterprise-wide- data governance

You might also like