BA2, Module 2 Guide
BA2, Module 2 Guide
2020-2021
E
Data Warehouse Components and
MODULE 2: 4 hrs.
Gathering Business Requirements
Learning Objectives
At the end of this module, you must be able to:
Learning Evidence
1. Accomplished Assignment
2. Accomplished Quiz
Rubric/Evaluation Tool
The following rubrics shall be utilized in evaluating and grading your work:
LE1: Accomplished
Assignment
Area to Weight Excellent Above Average Passing Failure
Assess Average
Complete- 60% All required 86-99% of 71-85% of 50%-70% of <50% of
ness contents are required required required required
present contents are contents are contents are contents are
present present present present
Substance 40% Depth & Depth & Depth & Depth & Generally
elaboration are elaboration are elaboration elaboration lacks depth &
exemplary very good are good are wanting elaboration
in some
parts
LE2: Quiz
Area to Superior Above Average Below Poor
Assess Average Average
Number of 91-100% 61-90% correct 51-60% 41-50% <40% correct
Correct correct
Answers
MODULE GUID
Flexible Learn
CONTENTS:
The core business processes of many organizations are becoming more dynamic and complex
because of globalization and evolving technology asserts that data warehousing is a system
architecture, not a software product or application. Building a data warehouse requires the
integration of many tasks and components and coordination of the efforts of many people.
The components organizes the essential components for defining and
understanding data warehouses and supports a methodical approach to presenting
the data warehousing process
MODULE GUID
Flexible Learn
The next step in building the warehouse is data preparation and data cleansing. It involves the
extraction of source data, transformation into new forms, and loading into the data warehouse
environment:
• query management
• access control
• disaster recovery
• tool integration
• directory management
• security request control
• capacity planning
• data usage auditing
• user administration
Effective governance is considered a key to data warehouse success.
MODULE GUID
Flexible Learn
Multiple levels of requirements are needed to build a successful data warehouse environment.
These range from high-level strategic planning to detailed data analysis. Each level represents
a different type of information that is gathered. Figure above shows the different layers, each
progressively more detailed.
MODULE GUID
Flexible Learn
• Provide insight into the vision and overall goals of the organization
• Focus on the big picture and look at the entire enterprise.
These requirements need to be at a high level of detail. These are used to help define the
charter and scope of the data warehouse project. More information can be found in the next
section about strategic requirements.
This is the basic bread-and-butter content that is needed as the foundation for design and
development of the data warehouse. It is still too soon to begin crawling through individual
reports, or tables and columns of data. A basic understanding of the business function itself is
needed first.
Explore with the interview group the challenges facing their part of the organization, how
success is measured, and what problems the group is facing. This discussion should also cover
what reporting and analytical needs exist.
MODULE GUID
Flexible Learn
Business Analysis - Depending on the audience, there may also be discussion about analyses
that are currently performed. The key is to be able to understand why these are done and what
happens with the results.
It is also useful to be able to tie a specific business analysis to a broad business theme
discussed in the previous slide. This helps to clarify the underlying purpose for performing the
analysis
Business Data - At this point of the project, it is most important to get a complete picture of all
of the data that would be useful to the business community. This is a time for the business
representatives to share the realities of the data used regularly and to be able to share a vision
about other data that would be valuable, if available.
• Opportunity to set new priorities and then modify the project charter and scope as
necessary
• Determine whether the results of the requirements gathering process so far are aligned
with the project charter and scope
• Before jumping into more detailed requirements gathering or design work, it is important
to take a moment to determine whether the results of the requirements gathering
process so far are aligned with the project charter and scope. If not, this is the
opportunity to set new priorities and then modify the project charter and scope as
necessary.
• Then, with confirmation of the original project scope or a revised project scope, more
detailed requirements can be gathered. These are typically done in later stages of a data
MODULE GUID
Flexible Learn
Actual data sources - selected to be the source for the business data that must be included, as
defined by the business analyses.
All of these requirements must be collected for the first data warehouse project. The project
charter and scope will help determine the group of people needed to provide requirements
input. The project scoping and prioritization step will further narrow the focus for this project.
This includes:
• Launch
• Interview Flow
• Wrap-up
It’s time to sit down face to face to collect the business requirements. The process usually flows
from an introduction through structured questioning to a final wrap-up.
Launch
• The designated kickoff person should script the primary points to be conveyed in the first
couple minutes when you set the tone of the interview meeting. The introduction should
convey a crisp, business-centric message.
Interview Flow
• IT People will try to meet us on your own business turf
• They will ask about your key performance metrics
• How business people track progress and success translates directly into the dimensional
model
The objective of an interview is to get business users to talk about what they do and why they
do it. A simple, nonthreatening place to begin is to ask about their job responsibilities and
organizational fit. This is a lob ball that interviewees can respond to easily. From there, we
typically ask about their key performance metrics.
Such questions as
• The interviewer will ask about the success criteria for the project
• Each criterion should be measurable
• Try to articulate specifics
• Take advantage of this opportunity to manage expectations.
• As the interview is coming to a conclusion, we ask each interviewee about his or her
success criteria for the project. Of course, each criterion should be measurable. Easy to
use and fast mean something different to everyone, so you should get the interviewees
to articulate specifics, such as their expectations regarding the amount of training
required to run a predefined report.
• At this point in the interview we make a broad disclaimer. The interviewees must
understand that just because we discussed a capability in the meeting doesn’t guarantee
that it’ll be included in the first phase of the project.
Data-Centric Interviews by IT
• They will Intersperse sessions with the source system data gurus or subject matter
experts
• They will evaluate the feasibility of supporting the business needs
• While we’re focused on understanding the requirements of the business, it is helpful to
intersperse sessions with the source system data gurus or subject matter experts to
evaluate the feasibility of supporting the business needs. These data-focused interviews
are quite different from the ones just described.
• The goal is to assess that the necessary core data exists before momentum builds
behind the requirements.
Ground Rules for Effective Interviewing
If you don't learn what the business really needs from the data warehouse, you can't
provide it.
Business acceptance is the most critical measure of DW/BI success. If the business doesn't
embrace the DW/BI deliverables to support its decision making processes, then the DW/BI
initiative is an exercise in futility.
This is what IT people want to uncover from Business users. It is best to get familiarized with
these requirements to maximize the potential of the interview process.
• Business Needs
• Data Quality
• Security
• Data Integration
• Data Latency
• Archiving and Lineage
MODULE GUID
Flexible Learn
Business Needs
• Gathering and understanding all the known requirements, realities, and constraints affecting
the ETL system. The list of requirements can be pretty overwhelming, but it's essential to
lay them on the table before launching into the development of your ETL system.
• The list of requirements can be pretty overwhelming, but it's essential to lay them on the
table before launching into the development of your ETL system.
• Typical due diligence requirements for the data warehouse include the following:
• Saving archived copies of data sources and subsequent data staging
• Providing proof of the complete transaction flow that changed any data results
• Fully documenting algorithms for allocations, adjustments, and derivations
• Supplying proof of security of the data copies over time, both online and offline
• Typical due diligence requirements for the data warehouse include the following:
• Saving archived copies of data sources and subsequent data staging
• Providing proof of the complete transaction flow that changed any data results
• Fully documenting algorithms for allocations, adjustments, and derivations
• Supplying proof of security of the data copies over time, both online and offline
• Some compliance issues will be outside the scope of the data warehouse system, but many
others will land squarely within its boundaries.
• Changing legal and reporting requirements have forced many organizations to seriously
tighten their reporting and provide proof that the reported numbers are accurate, complete,
and have not been tampered with.”
Data Quality
Three powerful forces have converged to put data quality concerns near the top of the list for
executives.
– “If only I could see the data, then I could manage my business better“.
Every knowledge worker believes instinctively that data is a crucial requirement for them to
function in their jobs.
– Data sources are profoundly distributed, typically around the world, and
that effectively integrating myriad disparate data sources is required.
– Third, the sharply increased demands for compliance mean that careless
handling of data will not be overlooked or excused.
First, the long term cultural trend that says "if only I could see the data, then I could manage my
business better" continues to grow; today every knowledge worker believes instinctively that
data is a crucial requirement for them to function in their jobs.
Most organizations understand that their data sources are profoundly distributed, typically
around the world, and that effectively integrating myriad disparate data sources is required.
Security
The basic rhythms of the data warehouse are at odds with the security mentality.
• The data warehouse seeks to publish data widely to decision makers
• Security interests assume that data should be restricted to those with a need to know.
• Security awareness has increased significantly in the past few years across IT, but often
remains an afterthought and an unwelcome burden to most data warehouse teams.
MODULE GUID
Flexible Learn
• Describes how quickly source system data must be delivered to the business users via the
system
• Processing algorithms, parallelization, and potent hardware can speed up traditional batch-
oriented data flows
• Data latency obviously has a huge effect on the ETL architecture. Clever processing
algorithms, parallelization, and potent hardware can speed up traditional batch-oriented
data flows.
• But at some point, if the data latency requirement is sufficiently urgent, the ETL system's
architecture must convert from batch to streaming oriented. This switch isn't a gradual or
evolutionary change; it's a major paradigm shift in which almost every step of the data
delivery pipeline must be re-implemented.
Archiving and Lineage
• Needed either for comparisons with new data to generate change capture
records or reprocessing
• Should have accompanying metadata describing the origins and processing
steps that produced the data
• Tracking is explicitly required by certain compliance requirements, but should be
part of every archiving scenario
• It is recommended staging the data (writing it to a storage media) after each
major activity of the ETL pipeline: after it's been extracted, cleaned and conformed, and
delivered.”
User Delivery
• The final step for the ETL system is the handoff to the analytics applications.
• Teams working closely with the modeling team, must take responsibility for the content and
structure of the data that makes the analytics applications simple and fast
• The ETL team and data modelers need to work closely with the analytics application
developers to determine the exact requirements for the data handoff
• It's irresponsible to hand off data to the analytics application in such a way as to increase
the complexity of the application, slow down the query or report creation, or make the data
seem unnecessarily complex to the business users.
MODULE GUID
Flexible Learn
• Some ETL system design decisions must be made on the basis of available
resources to build and manage the system.
• Not advisable to go in the unfamiliar direction without seriously considering the
decision's long term implication
• You shouldn't build a system that depends on critical C + + processing modules if
those programming skills aren't in house or can't be reasonably acquired. Likewise, you
may be much more confident in building your ETL system around a major vendor's ETL tool
if you already have those skills in house and know how to manage such a project.
• You need to consider the big decision of whether to hand code your ETL system
or use a vendor's package.
LEARNING TASK
Answer the following questions in your OWN words. DO NOT copy verbatim from the material
given:
1. Enumerate the components of a data warehouse system. Briefly describe and give the
significance of each one.