Lesson 6 Six Dimension
Lesson 6 Six Dimension
A.G. Tupaz cor. M.V. Farinas Streets, Brgy. 8 San Vicente, Laoag City
Learning Outcomes:
1. Identify the Six Dimensions of Data Quality
2. Understand what the data source is.
3. Identify the Different tools or methods of Data Consolidation
4. Identify the Challenges with Data Consolidation
Data Consolidation - is the process that combines all of that data wherever it may live, removes any
redundancies, and cleans up any errors before it gets stored in one location, like a data warehouse or data lake.
❖ Different tools or methods of Data Consolidation
1. Hand-coding or scripting.
➢ This manual process custom builds scripting by data scientists to combine and consolidate data from a
predetermined range of sources.
2. Open-source tools.
➢ Open-source software helps organizations combine and consolidate data with relatively little cost and
more flexibility, but requires a higher degree of expertise in coding and usually more manpower.
3. Cloud-based tools.
➢ A modern approach to data consolidation, cloud-based tools automate many data consolidation tasks
with speed, scalability, and security.
❖ Challenges with Data Consolidation
1. Limited time.
➢IT teams already have their hands full configuring, maintaining, and monitoring on-site hardware and other
equipment, in addition to keeping up with the rest of their daily tasks.
2. Limited resources.
➢ Any data integration process usually requires the help of skilled data scientists. Yet many organizations
don't have the budget or internal buy-in to staff up with the right resources to get the job done.
3. Scattered locations.
➢ Many businesses operate with remote or branch locations, which means that data isn't available in a
single physical place but has to be secured and managed in multiple locations.
4. Security issues.
➢ Every place where data is stored opens up the potential for a hack or breach. And moving data to another
place during the data consolidation process only increases that potential. As well, most businesses have
to adhere to some level of regulatory standards.
Modern Cloud-Base Data Consolidation- these tools are built for speed, security, scalability, and flexibility -
no matter where or in what form your data exists.
1
Data Center College of the Philippines of Laoag City, Inc.
A.G. Tupaz cor. M.V. Farinas Streets, Brgy. 8 San Vicente, Laoag City
Data Source- a set of fields that provide the data for a business unit for data transfer into BI. From a technical
viewpoint, the Data Source is a set of logically-related fields that are provided to transfer data into BI in a flat
structure (the extraction structure), or in multiple flat structures (for hierarchies).
➢ Represents the business objects that contain the most valuable, agreed upon information
shared across an organization. It can cover relatively static reference
data, transactional, unstructured, analytical, hierarchical and metadata. master data may
contain information about customers, products, employees, materials, suppliers, and vendors.
➢ Attributes data source are the category in which the server should look for its value, or allow the
server to locate the value hierarchically.
❖ Disparate Data
• Any data that are essentially not alike, or are distinctly different in kind, quality, or character.
• Are the result of three major trends in data resource management—prolific hype-cycles, a large
lexical challenge, and the five horsemen*.
• Disparate Data are any data that are essentially hot alike, or are distinctly different in kind, quality, or
character. They are unequal and cannot be readily integrated to meet the business information
demand. They are low quality, defective, discordant, ambiguous, heterogeneous data.
Disparate Data Resource- data resource that is substantially composed of disparate data that are dis-
integrated and not subject-oriented.
Data Resource is a component of information technology infrastructure that represents all the data available to
an organization, whether they are automated or non-automated. Different business organizations may have
different needs.
❖ The Five Horseman
1. Brute-Force-Physical
➢ Action that goes directly to the task of developing the physical database.
➢ Include creating the database code without any formal analysis or design of the business needs.
2. Paralysis-by-Analysis
➢ Actions that are an ongoing analysis and modelling effort to make sure everything is complete and
correct. Data analysts and data modelers are well known for analysing a situation and working the
problem forever before moving ahead.
2
Data Center College of the Philippines of Laoag City, Inc.
A.G. Tupaz cor. M.V. Farinas Streets, Brgy. 8 San Vicente, Laoag City
3. Warping-the-Business
➢ Actions that warp the design of the organization’s data to the fixed data design of a purchased
application. Each organization has a data design that fits their perception of the business world where
they operate. That data design often does not match the data design of a purchased application.
4. Suck-and-Squirt
➢ Actions that designate a single record or system of reference, sucks the data out of that record or
system of reference, performs superficial cleansing, and squirts the data into a target database. The
action is usually part of an Extract, transform, and load (ETL) process where little attention is paid to
the conditional sourcing of data, the data integrity, or the data meaning.
5. Process-Structured-Data
➢ Actions that structure the data resource according to the processes using the data rather than
according to formal data design techniques.
Data Governance
- Is a data management concepts concerning the capacity that enables an organization that a high
quality exists throughout the complete lifecycle of the data.
- Data governance decides what to do with the data then follows up to monitor and watches data
management to make sure that everything is done correctly.
Data Governance Processes
- Process document data definitions and business context associated with business terminology.
- As data becomes increasingly valuable. The challenge of protecting customer data mounts more and
more business to embrace data governance.
- Data governance is a decision making, monitoring and enforcement body that has authority over data
management.
-require granular control over data such as it can be easily accessible and malleable enough to
repurpose.