Assignment 1
Assignment 1
1) State the difference between Data Warehouse & Data Mart. Data Mart Data Warehouse
Page | 1
A Data Warehouse (DW) is a single organizational repository of enterprise wide data across many or all subject areas. The Data Warehouse is the authoritative repository of all the fact and dimension data (that is also available in the data marts) at an atomic level.
A Data Warehouse is a relational database which is specially designed for analysis purpose rather than for transactional purpose. It is a collection of data marts. It Represents historical data.
Data Warehouse Will be your target for historical data and will be use for Analysis. Data Warehouse is used on an enterprise level
In Data warehouse, staging area is used for mapping and integration of data collected from different sources. A Data warehouse consists of many different types of data structures
Page | 2
Overall Architecture The data warehouse architecture is based on a relational database management system server that functions as the central repository for informational data. Operational data and processing is completely separated from data warehouse processing. This central information repository is surrounded by a number of key components designed to make the entire environment functional, manageable and accessible by both the operational systems that source data into the warehouse and by end-user query and analysis tools.
Page | 3
Data Staging 'Area' The data staging area is the place where all 'grooming' is done on data after it is pulled from the Source Systems. The end point of grooming is for the Data to be loaded into the 'Analysis OR Presentation Server'. Data staging covers most of the 'back-bone' activities of a Data-Warehouse, which typically are also the biggest analytical and technical challenge of a project. These activities are 'Extraction' and 'Transformation' ETL-Data Extraction Data Extraction is an activity, which pulls the data from various data sources. Most of these sources are production systems OR are used for transaction level work. ETL-Data Transformation If Data Extraction is mining the iron ore, Transformation is to create the steel billets. The Transformation makes sure that the transaction level raw data is transformed into a form (while still being detailed) so that it can be loaded into the 'presentation/Loaded' area. ETL-Presentation/Loaded 'Area' This is the repository where the data is finally loaded after going through all the works of Extraction and Transformation. This becomes the ultimate source for information for various reasons ranging from queries to advanced data modeling. Dimensional Model The presentation area has data model, which is different from that of production system. This is called Dimensional Model. It is the way data is organized in datawarehouse. This concept has been dealt with fair degree of detail as this is the engine of Data Warehouse. Meta Data Meta Data subject is covered in a separate section. It contains all the business and technical designs, rules and locations etc. of all the data starting from the Extraction to final data usage.
Page | 4
End User Tools and Applications. Data is cooked for consumption. There is a long list of applications to which the data can be put to and the tools, which can make it happen. This includes the reporting, publishing, analysis, modeling and mining tools. Data-Warehouse Administration and Tools Data warehouse is a large platform, which has large number of users, data sources and data targets. Just like production systems, it has to be administered in terms of performance, timelines and availability. This also includes activity logging, data security, backing-up and archiving. Data- Marts The entire section of Data Warehouse is equally applicable to a Data-Mart. A Data-Mart is a Data repository with a more restricted and short-term perspective. Please refer to De-Normalized Data Warehouse/Data Mart for similarities and differences between a Data Warehouse and a Data Mart. OLAP Servers & Data Marts While Data Warehouse can be accessed for any end-user tools application, it also feeds to the downstream OLAP Layer. For example, HR wants to have its own data mart in their separate servers due to confidential reasons. Similarly people who are traveling may need to have their own offline data Mart. Access Tools The principal purpose of data warehousing is to provide information to business users for strategic decision-making. These users interact with the data warehouse using front-end tools. Many of these tools require an information specialist, although many end users develop expertise in the tools. Tools fall into four main categories: query and reporting tools, application development tools, online analytical processing tools, and data mining tools. Query and Reporting tools can be divided into two groups: reporting tools and managed query tools. Reporting tools can be further divided into production reporting tools and report writers. Production reporting tools let companies generate regular operational reports or support high-volume batch jobs such as calculating and printing paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for end-users. Managed query tools shield end users from the complexities of SQL and database structures by inserting a metalayer between users and the database.
Page | 5
Page | 6
Page | 7
Independent Data Marts An independent data mart is created without the use of a central data warehouse. This could be desirable for smaller groups within an organization. It is not, however, the focus of this Guide. See the Data Mart Suites documentation for further details regarding this architecture. Figure illustrates an independent data mart.
Page | 8
Hybrid Data Marts A hybrid data mart allows you to combine input from sources other than a data warehouse. This could be useful for many situations, especially when you need ad hoc integration, such as after a new group or product is added to the organization. Figure illustrates a hybrid data mart.
Extraction, Transformation, and Transportation The main difference between independent and dependent data marts is how you populate the data mart; that is, how you get data out of the sources and into the data mart. This step, called the Extraction-Transformation-Transportation (ETT)
Page | 9
Page | 10
Page | 11
Page | 12
Page | 13
Page | 14
Page | 15
7) Explain Virtual & Dynamic Warehouse. A virtual warehouse provides the opportunity for retailers to advertise a whole new line of items online that they would not otherwise have room for on their own shelves. Through the distributors virtual warehouse services, customers can order products from the retailers website. The order is sent to the distributors warehouse, where it is picked, packed and shipped directly to the customer. The benefits: Customers appreciate a fully-stocked inventory, with multiple ordering options and fast shipments. Taking advantage of a virtual warehouse gives retailers the ability to expand their customer base with new products, while increasing customer loyalty through superior services. Because the distributors provide the inventory space, in addition to the picking, packing and shipping labor, retailers can cut costs significantly while improving profits. Distributors expand business while reducing inventories, and with the ability to continually update prices online, distributors no longer have to honor outdated prices that are often found in catalogs. Dynamic warehousing systems are replacing simple static systems more and more. For good reason;the advantages are varied and convincing: - better usage of space - shorter transport routes - less handling equipment - lower costs