Warehouse Complete

The document discusses different aspects of data warehousing including: 1. The differences between OLTP (Online Transaction Processing) systems and data warehouses, with OLTP designed for recording transactions and data warehouses designed for querying and analysis. 2. The data warehouse development life cycle which covers warehouse management and data management phases like requirements gathering, modeling, and designing the warehouse. 3. Metadata which maps data from operational systems to analytical systems and includes extraction, transformation, and access metadata. Metadata provides navigation and interfaces for business users.

Uploaded by

gurugabru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

337 views6 pages

Warehouse Complete

Uploaded by

gurugabru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 6

Bachelor of Computer Application (BCA) Semester 6 BC0058 Data Warehousing 4 Credits

Assignment Set 1 (60 Marks) Ques 1 Explain the differences between OLTP and Data Warehouse. Ans Application databases are OLTP (On-Line Transaction Processing) systems where every transaction has to be recorded as and when it occurs. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this happens frequently, the bank wouldnt stay in business for too long. So the banking system is designed to make sure that every transaction gets recorded within the time you stand before the ATM machine. A Data Warehouse (DW) on the other end, is a database (yes, you are right, its a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analyzed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized. Separation from your application database also ensures that your business intelligence solution is scalable (your bank and ATMs dont go down just because the CFO asked for a report), better documented and managed. Creation of a DW leads to a direct increase in quality of analysis as the table structures are simpler (you keep only the needed information in simpler tables), standardized (well-documented table structures), and often de-normalized (to reduce the linkages between tables and the corresponding complexity of queries). Having a well-designed DW is the foundation for successful BI (Business Intelligence)/Analytics initiatives, which are built upon. Data Warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction. OLTP VS Data Warehouses Property OLTP Nature of Data Warehouse 3 NF Indexes Few Joins Many Duplicate data Normalized Aggregate data Rare Queries Mostly predefined Nature of queries Mostly simple Updates All the time Historical data Often not available Data Warehouse Multidimensional Many Some Demoralized Common Mostly adhoc Mostly complex Not allowed, only refreshed Essential

2.With necessary diagram, Explain about Data Warehouse Development Life Cycle. Ans The Data Warehouse project As an IT professional, you have worked on application projects before. You know what goes on in these projects and are aware of the methods needed to build the applications from planning through implementation. You have been part of the analysis, the design, the programming, or the testing phases. If you have functioned as a project manager or a team leader, you know how projects are monitored and controlled. A project is a project. If you have seen one IT project, have you not seen them all?The answer in not a simple yes or no; the Data Warehouse projects are different from projects building the transaction processing systems. If you are new to Data Warehousing, your first Data Warehouse project will reveal the major differences. We will discuss these differences and also consider ways to react to them. We will also ask a basic question about the readiness of the IT and user departments to launch a Data Warehouse project.How about the traditional system development life cycles (SDLC) approach? Can we use this approach to Data Warehouse projects as well? If so, what are the development phases in the life cycle? Data Warehouse Development Life Cycle

The Data Warehouse development life cycle covers two vital areas. One is warehouse management and the second one is data management. The former deals with defining the project activities and requirements gathering; where as the latter deals with modeling and designing the Warehouse. Life Cycle of Data Warehouse Development

Life Cycle steps of a DWH (SDLC) 3. What is Metadata? What is its use in Data Warehouse Architec ture Ans Acquisition metadata maps the translation of information from the operational system to the analytical system. This includes an extract history describing data origins, updates, algorithms used to summarize data, and frequency of extractions from operational systems. Transformation metadata includes a history of data transformations, changes in names, and other physical characteristics. Access metadata provides navigation and graphical user interfaces that allow non-technical business users to interact intuitively with the contents of the warehouse. And on top of these three types of metadata, a warehouse needs basic operational metadata, such as procedures on how a data warehouse is used and accessed, procedures on monitoring the growth of the data warehouse relative to the available storage space, and authorizations on who is responsible for and who has access to the data in the data warehouse and data in the operational system. Technical Metadata It is the metadata concerned with the information system characteristics. This technical Metadata is focuses on granularity of the data. Technical metadata (ETL process metadata, back room metadata, transformation metadata) is a representation of the ETL process. It stores data mapping and transformations from source systems to the data warehouse and is mostly used by data warehouse developers, specialists and ETL modelers. Most commercial ETL applications provide a metadata repository with an integrated metadata management system to manage the ETL process definition. The definition of technical metadata is usually more complex than the business metadata and it sometimes involves multiple dependencies. The technical metadata can be structured in the following way: Source Database or system definition. It can be a source system database, another Data warehouse, file system, etc. Target Database Data Warehouse instance Source Tables one or more tables which are input to calculate a value of the field Source Columns one or more columns which are input to calculate a value of the field Target Table target DW table and column are always single in a metadata repository Target Column target DW column Transformation the descriptive part of a metadata entry. It usually contains a lot of information, so it is important to use a common standard throughout the organization to keep the data consistent. Field to field mappings between sources to target. Number of scanned reports and ad-hoc reports

4 What is Surrogate key? When do we need it in data warehouse implementation? Ans An important distinction between a surrogate and a primary key depends on whether the database is a current database or a temporal database. Since a current database stores only currently valid data, there is a one-to-one correspondence between a surrogate in the modelled world and the primary key of some object in the database. In this case the surrogate may be used as a primary key, resulting in the term surrogate key. In a temporal database, however, there is a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is required, in addition to the surrogate, to uniquely identify each object. Although Hall et al. (1976) say nothing about this, others[specify] have argued that a surrogate should have the following characteristics: 1.the value is unique system-wide, hence never reused 2.the value is system generated 3.the value is not manipulable by the user or application 4.the value contains no semantic meaning 5.the value is not visible to the user or application 6.the value is not composed of several values from different domains. The main reason for building a Data Warehouse application is to make data available to business users. Users know the data best, and their participation in the testing effort is a key component to the success of a Data Warehouse implementation. User Acceptance Testing (UAT)typically focuses on data loaded to the Data Warehouse and any views that have been created on top of the tables, not the mechanics of how the ETL application works. Consider the following strategies: Use data that is either from production or as near to production data as possible. Users typically find issues once they see the "real" data, sometimes leading to design changes. Test database views comparing view contents to what is expected. It is important that users sign off and clearly understand how the views are created. Plan for the system test team to support users during UAT. The users will likely have questions about how the data is populated and need to understand details of how the ETL works. Consider how the users would require the data loaded during UAT and negotiate how often the data will be refreshed. 5. What is Data Loading? Explain the Full Refresh Loading. Ans Two distinct groups of tasks form the data loading function. When you complete the design and construction of the Data Warehouse and go live for the first time, you do the initial loading of the data into the Data Warehouse storage. The initial load moves large volumes of data using up substantial amounts of time. As the Data Warehouse starts functioning, you continue to extract the changes to the source data, transform the data revisions, and feed the incremental data revisions on an ongoing basis. The figure below illustrates the common types of data movements from the staging area to the Data Warehouse storage.

Data Movements

Data Storage Component The data storage for the Data Warehouse is a separate repository. The operational systems of your enterprise support the day-to-day operations. These are online transaction processing applications. The data repositories for the operational systems typically contain only the current data. Also, these data repositories contain the data structured in highly normalized formats for fast and efficient processing. In contrast, in the data repository for a Data Warehouse, you need to keep large volumes of historical data for analysis. Further, you have to keep the data in the Data Warehouse in structures suitable for analysis, and not for quick retrieval of individual pieces of information. Therefore, the data storage for the Data Warehouse is kept separate from the data storage for operational systems. In your databases supporting operational systems, the updates to data happen as transactions occur. These transactions hit the databases in a random fashion. How and when the transactions change the data in the databases is not completely within your control. The data in the operational databases could change from moment to moment. When your analysts use the data in the Data Warehouse for analysis, they need to know that the data is stable and that it represents snapshots at specified periods. As they are working with the data, the data storage must not be in a state of continual updating. For this reason, the Data Warehouses are read-only data repositories. Generally, the database in your Data Warehouse must be open. Depending on your requirements, you are likely to use tools from multiple vendors. The Data Warehouse must be open to different tools. Most of the Data Warehouses employ relational database management systems. Many of the Data Warehouses also employ multidimensional database management systems. Data extracted from the Data Warehouse storage is aggregated in many ways and the summary data is kept in the multidimensional databases (MDDBs). Such multidimensional database systems are usually proprietary products. Information Delivery Component Who are the users that need information from the Data Warehouse? The range is fairly comprehensive. The new user comes to the Data Warehouse with no training and, therefore, needs prefabricated reports and preset queries. The casual user needs information once in a while, not regularly. This type of user also needs prepackaged information. The business analyst looks for ability to do complex analysis using the information in the Data Warehouse. The power user wants to be able to navigate throughout the Data Warehouse, pick up interesting data, format his or her own queries, drill through the data layers, and create custom reports and ad hoc queries. In order to provide information to the wide community of Data Warehouse users, the information delivery component includes different methods of information delivery. The figure below shows the different information delivery methods.

Information Delivery methods

Ad hoc reports are predefined reports primarily meant for novice and casual users. Provision for complex queries, multidimensional (MD) analysis, and statistical analysis cater to the needs of the business analysts and power users. Information fed into Executive Information Systems (EIS) is meant for senior executives and high-level managers. Some Data Warehouses also provide data to data-mining applications. Data-mining applications are knowledge discovery systems where the mining algorithms help you discover trends and patterns from the usage of your data. In your Data Warehouse, you may include several information delivery mechanisms. Most commonly, you provide for online queries and reports. The users will enter their requests online and will receive the results online. You may set up delivery of scheduled reports through e-mail or you may make adequate use of your organizations intranet for information delivery. Recently, information delivery over the Internet has been gaining ground.

6 What Data Quality factors effects Data Warehouse. Explain them Ans Data quality in Data Warehouse Data Warehouse Components The DWQ project will provide a neutral architectural reference model covering the design, the setting-up, the operation, the maintenance, and the evolution of data warehouses. Figure 6.1 illustrates the basic components and their relationships as seen in current practice. The terms used in this figure can be briefly explained as follows: Sources: any data store whose content is subject to be materialized in a data warehouse. Wrappers: to load the source data into the warehouse Destination databases: data warehouses and data marts Meta database: repository for information about the other components, e.g. the schema of the source data Agents for administration (data warehouse design, scheduler for initiating updates, etc.) Clients to display the data, for example statistical packages

Structure of a Data Warehouse

The Linkage to Data Quality: DWQ provides assistance to DW designers by linking the main components of DW reference architecture to a formal model of data quality. Main differences to the initial model lie in the greater emphasis on historical as well as aggregated data. A data quality policy is the overall intention and direction of an organization with respect to issues concerning the quality of data products. Data quality management is the management function that determines and implements the data quality policy. A data quality system encompasses the organizational structure, responsibilities, procedures, processes and resources for implementing data quality management. Data quality control is a set of operational techniques and activities which are used to attain the quality required for a data product. Data quality assurance includes all the planned and systematic actions necessary to provide adequate confidence that a data product will satisfy a given set of quality requirements.

Quality Factors in Data Warehousing

Types of Data Quality Problems The following list of quality problems occur during data warehouse creation. All these problems have to be rectified during ETL processing. Dummy values in source system fields Absence of data in source system fields Multipurpose fields Cryptic data Contradicting data Improper use of name and address lines Violation of business rules Reused primary keys Non-unique identifiers

ITSM Gap Analysis For XYZ
100% (2)
ITSM Gap Analysis For XYZ
21 pages
Traditional File Processing System
No ratings yet
Traditional File Processing System
3 pages
MB 0047
No ratings yet
MB 0047
10 pages
Database Management System Sixth Semester (C.B.S.)
No ratings yet
Database Management System Sixth Semester (C.B.S.)
4 pages
MYSQL Cheat Sheet
No ratings yet
MYSQL Cheat Sheet
3 pages
Hospital Management System For MayoClinic AnithaVeeramani
No ratings yet
Hospital Management System For MayoClinic AnithaVeeramani
14 pages
Software Development: Bourgeois Et Al., 2019)
No ratings yet
Software Development: Bourgeois Et Al., 2019)
2 pages
Project Management - MB0049: Q.1 List and Explain The Traits If A Professional Manager
No ratings yet
Project Management - MB0049: Q.1 List and Explain The Traits If A Professional Manager
5 pages
Novo Master Service Agreement Update With CWS#
No ratings yet
Novo Master Service Agreement Update With CWS#
5 pages
01 - Usage of PKI in E-Procurement - Mr. J S Kochar
No ratings yet
01 - Usage of PKI in E-Procurement - Mr. J S Kochar
32 pages
BI Chapter 3 - SP2020 PDF
No ratings yet
BI Chapter 3 - SP2020 PDF
13 pages
Advantages of DBMS Over File Base System
No ratings yet
Advantages of DBMS Over File Base System
3 pages
The Data Warehousing Development Lifecycle
100% (1)
The Data Warehousing Development Lifecycle
5 pages
Data Flow Diagram Opening A New Room: Customer
No ratings yet
Data Flow Diagram Opening A New Room: Customer
5 pages
CSC2203 Database Systems: Course Information Sheet
No ratings yet
CSC2203 Database Systems: Course Information Sheet
3 pages
Software Eng Ans
No ratings yet
Software Eng Ans
5 pages
BC0044 Accounting and Financial Managment
No ratings yet
BC0044 Accounting and Financial Managment
5 pages
Work Plan Template
No ratings yet
Work Plan Template
15 pages
SAIPRASAD NAIK July2022
No ratings yet
SAIPRASAD NAIK July2022
4 pages
OCI Implementation 2021-1Z0-1094-21 Cloud Migration by Azim
No ratings yet
OCI Implementation 2021-1Z0-1094-21 Cloud Migration by Azim
66 pages
Fusion Applications Deep Dive Setup For Supply Chain Management: Product Information Management
100% (2)
Fusion Applications Deep Dive Setup For Supply Chain Management: Product Information Management
29 pages
Functional Dependency & Normalization: Chapter Objectives
No ratings yet
Functional Dependency & Normalization: Chapter Objectives
5 pages
SPPM Notes 1
No ratings yet
SPPM Notes 1
12 pages
GSR Azure High Level Architecture
No ratings yet
GSR Azure High Level Architecture
4 pages
Data Warehouse
No ratings yet
Data Warehouse
14 pages
Testing PDF
No ratings yet
Testing PDF
8 pages
Data Warehousing FAQ
No ratings yet
Data Warehousing FAQ
5 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
96 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
BCS 51 Sof - Eng Imp Ques
No ratings yet
BCS 51 Sof - Eng Imp Ques
4 pages
DATABASE SYSTEM DEVELOPMENT LIFECYLE Summary
No ratings yet
DATABASE SYSTEM DEVELOPMENT LIFECYLE Summary
9 pages
Dokumen - Tips - HR Renewal 20 fp2 Admin Guide PDF
No ratings yet
Dokumen - Tips - HR Renewal 20 fp2 Admin Guide PDF
31 pages
Database Systems: Hanem A. Eladly Computer Engineering Department
No ratings yet
Database Systems: Hanem A. Eladly Computer Engineering Department
43 pages
Differentiate Between OLTP and Data Warehouse
No ratings yet
Differentiate Between OLTP and Data Warehouse
10 pages
Notes On MIS
No ratings yet
Notes On MIS
17 pages
Chapter 7
No ratings yet
Chapter 7
24 pages
Unit 1
No ratings yet
Unit 1
14 pages
Data Warehouse Failures
No ratings yet
Data Warehouse Failures
8 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
48 pages
Using Hadoop For Data Warehouse Optimization
No ratings yet
Using Hadoop For Data Warehouse Optimization
8 pages
WP CMDB Design Guidance
No ratings yet
WP CMDB Design Guidance
20 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
48 pages
BC0034
No ratings yet
BC0034
2 pages
BC0036 Digital System
100% (1)
BC0036 Digital System
6 pages
BC0036
No ratings yet
BC0036
2 pages
BC0035 Computer Fundamental
No ratings yet
BC0035 Computer Fundamental
5 pages
BC0033 Mathematics
No ratings yet
BC0033 Mathematics
4 pages
ETL Process: - 4 Major Components
No ratings yet
ETL Process: - 4 Major Components
27 pages
Unit 1
No ratings yet
Unit 1
18 pages
ERP in Fashion: Implementation Issues and Business Benefits: September 2013
No ratings yet
ERP in Fashion: Implementation Issues and Business Benefits: September 2013
13 pages
DW Part B Notes For All Unit
No ratings yet
DW Part B Notes For All Unit
60 pages
Chap01 Data Warehouse 1
No ratings yet
Chap01 Data Warehouse 1
65 pages
Shilpa Ravichettu - Tableau3
No ratings yet
Shilpa Ravichettu - Tableau3
6 pages
MSBI Training Plans: Plan A Plan B Plan C
No ratings yet
MSBI Training Plans: Plan A Plan B Plan C
14 pages
Multidimensional Data Mode:-: Characteristics of Data Warehouse
100% (1)
Multidimensional Data Mode:-: Characteristics of Data Warehouse
26 pages
Assignment On Chapter 3 Data Warehousing and Management
No ratings yet
Assignment On Chapter 3 Data Warehousing and Management
17 pages
Database Trends and Applications Magazine Dec 2018 Jan 2019 Issue
No ratings yet
Database Trends and Applications Magazine Dec 2018 Jan 2019 Issue
60 pages
For Print MS ACCESS DataBase
No ratings yet
For Print MS ACCESS DataBase
10 pages
How To Develop A CRM Roadmap
No ratings yet
How To Develop A CRM Roadmap
3 pages
Cloud e Book
No ratings yet
Cloud e Book
10 pages
ETL Testing in Less Time
No ratings yet
ETL Testing in Less Time
16 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
Data War Eh Puse
No ratings yet
Data War Eh Puse
51 pages
Enterprise System Lecture02
No ratings yet
Enterprise System Lecture02
12 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages
Data Warehousing Dr. L. Rajya Lakshmi
No ratings yet
Data Warehousing Dr. L. Rajya Lakshmi
16 pages
Enterprise Architecture
No ratings yet
Enterprise Architecture
14 pages
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
No ratings yet
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
5 pages
Olap and Oltap
No ratings yet
Olap and Oltap
14 pages
Assignment 1
No ratings yet
Assignment 1
15 pages
Data Warehousing Quick Guide
No ratings yet
Data Warehousing Quick Guide
66 pages
What's A Data Warehouse
No ratings yet
What's A Data Warehouse
24 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
Data Warehouse Design Considerations by Dave Browning and Joy Mundy
No ratings yet
Data Warehouse Design Considerations by Dave Browning and Joy Mundy
17 pages
ETL State of The Art
No ratings yet
ETL State of The Art
198 pages
Cloud Computing World Issue 3 - December 2014 PDF
No ratings yet
Cloud Computing World Issue 3 - December 2014 PDF
46 pages
Mba 815 Management Information System
No ratings yet
Mba 815 Management Information System
196 pages
BCA 2nd Year
No ratings yet
BCA 2nd Year
31 pages
Bachelor in Business Administration School: Business Administration Semester
No ratings yet
Bachelor in Business Administration School: Business Administration Semester
15 pages
MIS Unit 2
No ratings yet
MIS Unit 2
20 pages
Diploma in Logistic & Supply (Cargo Management)
No ratings yet
Diploma in Logistic & Supply (Cargo Management)
10 pages
Dimensional Models Intro
No ratings yet
Dimensional Models Intro
18 pages
SQL Server 2008 Business Intelligence
100% (5)
SQL Server 2008 Business Intelligence
16 pages
Improving The Software System
No ratings yet
Improving The Software System
20 pages
Data Warehouse References
No ratings yet
Data Warehouse References
40 pages
Data Warehousing Chapter 1
No ratings yet
Data Warehousing Chapter 1
8 pages
Production Management Set1 and Set2
No ratings yet
Production Management Set1 and Set2
40 pages
Data Warehouse Databases
No ratings yet
Data Warehouse Databases
28 pages
Human Resource Management Course Outline
No ratings yet
Human Resource Management Course Outline
22 pages
Bahria University: Assignment # 1
No ratings yet
Bahria University: Assignment # 1
9 pages
DWH
No ratings yet
DWH
48 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Pom 76 Page
No ratings yet
Pom 76 Page
76 pages
Final Project
No ratings yet
Final Project
45 pages
Enterprise Resource Planing - ERP
No ratings yet
Enterprise Resource Planing - ERP
14 pages
Module 1
No ratings yet
Module 1
132 pages
Data Mining Unit - 1 Notes
No ratings yet
Data Mining Unit - 1 Notes
16 pages
Unit No: 01 Introduction To Data Warehouse: by Pratiksha Meshram
No ratings yet
Unit No: 01 Introduction To Data Warehouse: by Pratiksha Meshram
38 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages

Warehouse Complete

Uploaded by

Warehouse Complete

Uploaded by

Bachelor of Computer Application (BCA) Semester 6 BC0058 Data Warehousing 4 Credits

Information Delivery methods

Structure of a Data Warehouse

Quality Factors in Data Warehousing

You might also like