Ebook436 pages4 hours

Fundamentals of Data Warehouses

Name: Fundamentals of Data Warehouses
Brand: Springer
Rating: 4.0 (1 reviews)

By Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou and Panos Vassiliadis

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

Data warehouses have captured the attention of practitioners and researchers alike. But the design and optimization of data warehouses remains an art rather than a science. This book presents the first comparative review of the state of the art and best current practice of data warehouses. It covers source and data integration, multidimensional aggregation, query optimization, update propagation, metadata management, quality assessment, and design optimization. Also, based on results of the European Data Warehouse Quality project, it offers a conceptual framework by which the architecture and quality of data warehouse efforts can be assessed and improved using enriched metadata management combined with advanced techniques from databases, business modeling, and artificial intelligence. For researchers and database professionals in academia and industry, the book offers an excellent introduction to the issues of quality and metadata usage in the context of data warehouses.

Skip carousel

LanguageEnglish

PublisherSpringer

Release dateMar 9, 2013

ISBN9783662051535

Author

Matthias Jarke

Related authors

Skip carousel

Related to Fundamentals of Data Warehouses

Related ebooks

Skip carousel

Computer Science and Ambient Intelligence
Ebook
Computer Science and Ambient Intelligence
byGaëlle Calvary
Rating: 0 out of 5 stars
0 ratings
Predictive Maintenance in Smart Factories: Architectures, Methodologies, and Use-cases
Ebook
Predictive Maintenance in Smart Factories: Architectures, Methodologies, and Use-cases
byTania Cerquitelli
Rating: 0 out of 5 stars
0 ratings
Requirements Engineering
Ebook
Requirements Engineering
byJeremy Dick
Rating: 3 out of 5 stars
3/5
Industrial Sensors and Controls in Communication Networks: From Wired Technologies to Cloud Computing and the Internet of Things
Ebook
Industrial Sensors and Controls in Communication Networks: From Wired Technologies to Cloud Computing and the Internet of Things
byDong-Seong Kim
Rating: 0 out of 5 stars
0 ratings
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
Ebook
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
byEMC Education Services
Rating: 0 out of 5 stars
0 ratings
Embedded Systems: Analysis and Modeling with SysML, UML and AADL
Ebook
Embedded Systems: Analysis and Modeling with SysML, UML and AADL
byFabrice Kordon
Rating: 0 out of 5 stars
0 ratings
A Framework for Visualizing Information
Ebook
A Framework for Visualizing Information
byE.H. Chi
Rating: 0 out of 5 stars
0 ratings
Application of FPGA to Real‐Time Machine Learning: Hardware Reservoir Computers and Software Image Processing
Ebook
Application of FPGA to Real‐Time Machine Learning: Hardware Reservoir Computers and Software Image Processing
byPiotr Antonik
Rating: 0 out of 5 stars
0 ratings
Spatio-temporal Design: Advances in Efficient Data Acquisition
Ebook
Spatio-temporal Design: Advances in Efficient Data Acquisition
byJorge Mateu
Rating: 0 out of 5 stars
0 ratings
Cognitive Radio Communication and Networking: Principles and Practice
Ebook
Cognitive Radio Communication and Networking: Principles and Practice
byRobert Caiming Qiu
Rating: 0 out of 5 stars
0 ratings
Failure Analysis: A Practical Guide for Manufacturers of Electronic Components and Systems
Ebook
Failure Analysis: A Practical Guide for Manufacturers of Electronic Components and Systems
byMarius Bazu
Rating: 0 out of 5 stars
0 ratings
Pipelined Processor Farms: Structured Design for Embedded Parallel Systems
Ebook
Pipelined Processor Farms: Structured Design for Embedded Parallel Systems
byMartin Fleury
Rating: 0 out of 5 stars
0 ratings
Multicriteria Portfolio Construction with Python
Ebook
Multicriteria Portfolio Construction with Python
byElissaios Sarmas
Rating: 0 out of 5 stars
0 ratings
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Ebook
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
byByron Ellis
Rating: 0 out of 5 stars
0 ratings
Knowledge Discovery with Support Vector Machines
Ebook
Knowledge Discovery with Support Vector Machines
byLutz H. Hamel
Rating: 0 out of 5 stars
0 ratings
Remoting Patterns: Foundations of Enterprise, Internet and Realtime Distributed Object Middleware
Ebook
Remoting Patterns: Foundations of Enterprise, Internet and Realtime Distributed Object Middleware
byMarkus Völter
Rating: 4 out of 5 stars
4/5
Non-volatile Memories
Ebook
Non-volatile Memories
byPierre-Camille Lacaze
Rating: 0 out of 5 stars
0 ratings
Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control
Ebook
Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control
byUwe Kruger
Rating: 0 out of 5 stars
0 ratings
IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning: Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14-18, 2020, Revised Selected Papers
Ebook
IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning: Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14-18, 2020, Revised Selected Papers
byJoao Gama
Rating: 0 out of 5 stars
0 ratings
Application Design: Key Principles For Data-Intensive App Systems
Ebook
Application Design: Key Principles For Data-Intensive App Systems
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Ebook
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
bySebastian Thelen
Rating: 5 out of 5 stars
5/5
Distibuted Systems: Design and Algorithms
Ebook
Distibuted Systems: Design and Algorithms
bySerge Haddad
Rating: 0 out of 5 stars
0 ratings
Evolutionary Algorithms and Neural Networks: Theory and Applications
Ebook
Evolutionary Algorithms and Neural Networks: Theory and Applications
bySeyedali Mirjalili
Rating: 0 out of 5 stars
0 ratings
Validated Numerics: A Short Introduction to Rigorous Computations
Ebook
Validated Numerics: A Short Introduction to Rigorous Computations
byWarwick Tucker
Rating: 0 out of 5 stars
0 ratings
Mining Over Air: Wireless Communication Networks Analytics
Ebook
Mining Over Air: Wireless Communication Networks Analytics
byYe Ouyang
Rating: 0 out of 5 stars
0 ratings
Data-Driven AI Architectures
Ebook
Data-Driven AI Architectures
bySimon Keith
Rating: 0 out of 5 stars
0 ratings
Live Trace Visualization for System and Program Comprehension in Large Software Landscapes
Ebook
Live Trace Visualization for System and Program Comprehension in Large Software Landscapes
byFlorian Fittkau
Rating: 0 out of 5 stars
0 ratings
Cognitive Computing and Big Data Analytics
Ebook
Cognitive Computing and Big Data Analytics
byJudith S. Hurwitz
Rating: 0 out of 5 stars
0 ratings
Communication Nets: Stochastic Message Flow and Delay
Ebook
Communication Nets: Stochastic Message Flow and Delay
byLeonard Kleinrock
Rating: 3 out of 5 stars
3/5
Domain-Specific Knowledge Graph Construction
Ebook
Domain-Specific Knowledge Graph Construction
byMayank Kejriwal
Rating: 0 out of 5 stars
0 ratings

Databases For You

Skip carousel

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Ebook
Blockchain Basics: A Non-Technical Introduction in 25 Steps
byDaniel Drescher
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
CompTIA DataSys+ Study Guide: Exam DS0-001
Ebook
CompTIA DataSys+ Study Guide: Exam DS0-001
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 2 out of 5 stars
2/5
Learn Git in a Month of Lunches
Ebook
Learn Git in a Month of Lunches
byRick Umali
Rating: 0 out of 5 stars
0 ratings
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
Ebook
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
byJhon Dujardin
Rating: 4 out of 5 stars
4/5
Visual Basic 6.0 Programming By Examples
Ebook
Visual Basic 6.0 Programming By Examples
bySergey Skudaev
Rating: 5 out of 5 stars
5/5
Troubleshooting PostgreSQL
Ebook
Troubleshooting PostgreSQL
byHans-Jürgen Schönig
Rating: 5 out of 5 stars
5/5
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 3 out of 5 stars
3/5
Go in Action
Ebook
Go in Action
byErik St. Martin
Rating: 5 out of 5 stars
5/5
Visualizing Graph Data
Ebook
Visualizing Graph Data
byCorey Lanum
Rating: 0 out of 5 stars
0 ratings
Data Analysis with R
Ebook
Data Analysis with R
byFischetti Tony
Rating: 5 out of 5 stars
5/5
Starting Database Administration: Oracle DBA
Ebook
Starting Database Administration: Oracle DBA
byanuragbaruah84
Rating: 3 out of 5 stars
3/5
MATLAB Machine Learning Recipes: A Problem-Solution Approach
Ebook
MATLAB Machine Learning Recipes: A Problem-Solution Approach
byMichael Paluszek
Rating: 0 out of 5 stars
0 ratings
Professional ADO.NET 3.5 with LINQ and the Entity Framework
Ebook
Professional ADO.NET 3.5 with LINQ and the Entity Framework
byRoger Jennings
Rating: 3 out of 5 stars
3/5
Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing
Ebook
Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing
byRyan Wade
Rating: 0 out of 5 stars
0 ratings
Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
Developing Analytic Talent: Becoming a Data Scientist
Ebook
Developing Analytic Talent: Becoming a Data Scientist
byVincent Granville
Rating: 3 out of 5 stars
3/5
Dark Data: Why What You Don’t Know Matters
Ebook
Dark Data: Why What You Don’t Know Matters
byDavid J. Hand
Rating: 3 out of 5 stars
3/5
Learn dbatools in a Month of Lunches: Automating SQL server tasks with PowerShell commands
Ebook
Learn dbatools in a Month of Lunches: Automating SQL server tasks with PowerShell commands
byChrissy LeMaire
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence Basics: A Non-Technical Introduction
Ebook
Artificial Intelligence Basics: A Non-Technical Introduction
byTom Taulli
Rating: 5 out of 5 stars
5/5
R: Recipes for Analysis, Visualization and Machine Learning
Ebook
R: Recipes for Analysis, Visualization and Machine Learning
byAtmajitsinh Gohil
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

37. Sean Knapp - The brave new world of data engineering
UNLIMITED
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
A Central Piece of the GenAI Puzzle
UNLIMITED
A Central Piece of the GenAI Puzzle
byThoughts on the Market
0 ratings
0% found this document useful
State of Containers in the Public Cloud
UNLIMITED
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Automated Data Labeling for AI Apps
UNLIMITED
Automated Data Labeling for AI Apps
byThe Cloudcast
0 ratings
0% found this document useful
Building A Reliable And Performant Router For Observability Data - Episode 97: An interview about building the Vector project to unify delivery of logs and metrics for better system observability
UNLIMITED
Building A Reliable And Performant Router For Observability Data - Episode 97: An interview about building the Vector project to unify delivery of logs and metrics for better system observability
byData Engineering Podcast
0 ratings
0% found this document useful
Network Analysis At The Speed Of C With The Power Of Python Using NetworKit: An interview with Eugenio Angriman about the NetworKit library and how you can use it to gain insights into large volumes of networked data
UNLIMITED
Network Analysis At The Speed Of C With The Power Of Python Using NetworKit: An interview with Eugenio Angriman about the NetworKit library and how you can use it to gain insights into large volumes of networked data
byThe Python Podcast.__init__
0 ratings
0% found this document useful
623 : Topical English Vocabulary Lesson With Teacher Tiffani about Technology Advancements
UNLIMITED
623 : Topical English Vocabulary Lesson With Teacher Tiffani about Technology Advancements
bySpeak English with Tiffani Podcast
0 ratings
0% found this document useful
Proposing Annoyance Mining: A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems. This episode explores why such a (seemingly obvious) flaw might make sense from an engineering...
UNLIMITED
Proposing Annoyance Mining: A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems. This episode explores why such a (seemingly obvious) flaw might make sense from an engineering...
byData Skeptic
0 ratings
0% found this document useful
The Modern Data Stack vs Hyperscale Data Warehousing: The modern data stack is a collection of cloud-based tools and technologies used to collect, store, process, and analyze data in a scalable way. It is a departure from traditional data stacks, which were often based on on-premises infrastructure and...
UNLIMITED
The Modern Data Stack vs Hyperscale Data Warehousing: The modern data stack is a collection of cloud-based tools and technologies used to collect, store, process, and analyze data in a scalable way. It is a departure from traditional data stacks, which were often based on on-premises infrastructure and...
byDM Radio
0 ratings
0% found this document useful
Understanding Graph Database Patterns
UNLIMITED
Understanding Graph Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
AI and ML Networking: bridging the gap between performance and economy
UNLIMITED
AI and ML Networking: bridging the gap between performance and economy
byTechnology Now
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
UNLIMITED
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
UNLIMITED
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
byData Engineering Podcast
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
UNLIMITED
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Vector Search Applications
UNLIMITED
Building Vector Search Applications
byThe Cloudcast
0 ratings
0% found this document useful
A murder mystery: who killed our user experience?: On this sponsored episode of the Stack Overflow Podcast, we talk with Greg Leffler of Splunk about the keys to instrumenting an observable system and how the OpenTelemetry standard makes observability easier, even if you aren’t using Splunk’s product.
UNLIMITED
A murder mystery: who killed our user experience?: On this sponsored episode of the Stack Overflow Podcast, we talk with Greg Leffler of Splunk about the keys to instrumenting an observable system and how the OpenTelemetry standard makes observability easier, even if you aren’t using Splunk’s product.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
Permanent Internet: Buy Once, Store Forever: Phil Mataras—CEO at Permanent Data Solutions—discusses Arweave's innovative “pay once, store forever” model and its potential to revolutionize various industries, from content creation to healthcare, by offering permanent storage solutions....
UNLIMITED
Permanent Internet: Buy Once, Store Forever: Phil Mataras—CEO at Permanent Data Solutions—discusses Arweave's innovative “pay once, store forever” model and its potential to revolutionize various industries, from content creation to healthcare, by offering permanent storage solutions....
byThe Brave Technologist
0 ratings
0% found this document useful
How evolution of sensor technology has empowered complex engineering projects: Fast paced development of electronics technology over the last decade has enable wireless sensors to go from being unproven but innovative solutions for monitoring complex engineering projects to become the go-to option for many schemes. The benefits...
UNLIMITED
How evolution of sensor technology has empowered complex engineering projects: Fast paced development of electronics technology over the last decade has enable wireless sensors to go from being unproven but innovative solutions for monitoring complex engineering projects to become the go-to option for many schemes. The benefits...
byThe Engineers Collective
0 ratings
0% found this document useful
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
UNLIMITED
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Building ML Apps
UNLIMITED
Building ML Apps
byThe Cloudcast
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
UNLIMITED
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
New Trends in Serverless
UNLIMITED
New Trends in Serverless
byThe Cloudcast
0 ratings
0% found this document useful
Data Observability
UNLIMITED
Data Observability
byThe Cloudcast
0 ratings
0% found this document useful
An Event-Driven Apps Look Ahead for 2021
UNLIMITED
An Event-Driven Apps Look Ahead for 2021
byThe Cloudcast
0 ratings
0% found this document useful
Data Migration Strategies For Large Scale Systems: Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
UNLIMITED
Data Migration Strategies For Large Scale Systems: Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments. In this episode he shares some of the valuable lessons that he learned about how to make those projects successful.
byData Engineering Podcast
0 ratings
0% found this document useful
Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus: An interview with Frank Liu about the open source vector database Milvus and how its native storage of vector embeddings reduces the friction involved in building and deploying machine learning models.
UNLIMITED
Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus: An interview with Frank Liu about the open source vector database Milvus and how its native storage of vector embeddings reduces the friction involved in building and deploying machine learning models.
byData Engineering Podcast
0 ratings
0% found this document useful
ATLAS with Dr. Mario Lassnig: Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN!
UNLIMITED
ATLAS with Dr. Mario Lassnig: Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN!
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Multi-Cloud Network Observability
UNLIMITED
Multi-Cloud Network Observability
byThe Cloudcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
UNLIMITED
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Network Reliability Engineering
UNLIMITED
Network Reliability Engineering
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
UNLIMITED
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
UNLIMITED
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Futurity
UNLIMITED
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Mar 26, 2019
4 min read
“You Don’t Need A Computer, Let Alone One With 75,000 Processor Cores, To Think About The Parts Of A Problem”
PC Pro Magazine
UNLIMITED
“You Don’t Need A Computer, Let Alone One With 75,000 Processor Cores, To Think About The Parts Of A Problem”
Dec 10, 2020
9 min read
How Technology Commons Revolutionise Industry Foundations
The European Business Review
UNLIMITED
How Technology Commons Revolutionise Industry Foundations
Feb 11, 2022
9 min read
The Holy Grail Of Computing
India Today
UNLIMITED
The Holy Grail Of Computing
Nov 16, 2024
In a corner of the Centre for Nano Science and Engineering (CeNSE) building inside the serene, leafy environs of the Indian Institute of Science (IISc), Bengaluru, is a Raman spectroscopy lab—the most silent working space you could ever imagine. Here
5 min read
Five Technology Tips For Dark Factories Installation
Techfastly
UNLIMITED
Five Technology Tips For Dark Factories Installation
Jun 1, 2021
6 min read
Business applications For Quantum computing
Rotman Management
UNLIMITED
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Building Trends, Building Momentum
Facility Management
UNLIMITED
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
UNLIMITED
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Invest In The Best Of Asia’s Innovators
MoneyWeek
UNLIMITED
Invest In The Best Of Asia’s Innovators
Jan 21, 2022
Asia is home to a wealth of innovative businesses generating valuable intellectual property. This often translates to defensible economic moats and higher profitability. At Cerno Capital, we look for companies in structurally growing industries provi
2 min read
‘Digital Twin’ Can Make Wireless Networks Better
Futurity
UNLIMITED
‘Digital Twin’ Can Make Wireless Networks Better
Jul 24, 2024
Researchers have developed a new method for predicting what data wireless computing users will need before they need it, making wireless networks faster and more reliable. The new method makes use of a technique called a “digital twin,” which effecti
2 min read
Opinion
Linux Format
UNLIMITED
Opinion
Jul 23, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “LibreOffice 24.8 will be announced in the second half of August, and the developers are working hard to optimise the new features that will be included. It will be the
3 min read
Moore’s Law Is About to Get Weird: Never mind tablet computers. Wait till you see bubbles and slime mold.
Nautilus
UNLIMITED
Moore’s Law Is About to Get Weird: Never mind tablet computers. Wait till you see bubbles and slime mold.
Feb 12, 2015
I’ve never seen the computer you’re reading this story on, but I can tell you a lot about it. It runs on electricity. It uses binary logic to carry out programmed instructions. It shuttles information using materials known as semiconductors. Its brai
7 min read
Chinese Students' Dream Device Defeats Japan's Most Powerful Supercomputer In World Contest
Post Magazine
UNLIMITED
Chinese Students' Dream Device Defeats Japan's Most Powerful Supercomputer In World Contest
Jun 15, 2022
A small computer developed by Chinese students outperformed Japan's most powerful machine in solving a major complex data problem related to artificial intelligence, according to the latest global ranking. Supercomputer Fugaku in Japan has nearly 4 m
3 min read
How Quantum Computing Can Fight Climate Change
PC Pro Magazine
UNLIMITED
How Quantum Computing Can Fight Climate Change
Oct 8, 2022
8 min read
태도가 건축이 될 때 When Attitude Becomes Architecture
Space
UNLIMITED
태도가 건축이 될 때 When Attitude Becomes Architecture
Dec 5, 2023
12 min read
Software And It Logistics To Shape Future Of Warehouse Management
Facility Management
UNLIMITED
Software And It Logistics To Shape Future Of Warehouse Management
Jun 24, 2018
1 min read
How Quantum Computing Can Fight Climate Change
APC
UNLIMITED
How Quantum Computing Can Fight Climate Change
Nov 28, 2022
8 min read
Mining Actionable Information with Smart Capture
The European Business Review
UNLIMITED
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
UNLIMITED
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
UNLIMITED
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Opinion
Linux Format
UNLIMITED
Opinion
Aug 20, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “Think about the personal and confidential information in your office suite documents; it’s essential your office suite respects user privacy. LibreOffice does not ask y
3 min read
High-Frequency Chip Makes Fastest Internet Speeds Look Slow
Futurity
UNLIMITED
High-Frequency Chip Makes Fastest Internet Speeds Look Slow
Sep 1, 2017
1 min read
Naga Chandrasekaran
HWM Singapore
UNLIMITED
Naga Chandrasekaran
Dec 6, 2022
Micron’s 232-layer NAND technology provided the high-performance storage necessary to support advanced solutions and real-time services required in data centre and automotive applications, thanks to benefits like longer battery life, better performan
3 min read
The Future Is All Quantum
Techfastly
UNLIMITED
The Future Is All Quantum
Oct 1, 2021
2 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
UNLIMITED
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Why The Future Needs Optical Data Centres
PC Pro Magazine
UNLIMITED
Why The Future Needs Optical Data Centres
Sep 10, 2020
9 min read
Quantum Simulators An Overview
Techfastly
UNLIMITED
Quantum Simulators An Overview
Oct 1, 2021
4 min read
새로운 유형의 건축 New Types of Architecture
Space
UNLIMITED
새로운 유형의 건축 New Types of Architecture
Dec 5, 2023
시대의 단면을 보여주는 건축 유형이 있다. 종교 중심 사회였던 중세시대 유럽의 대성당, 산업혁명기 대량생산을 위해 등장한 공장과 노동자를 위한 공동주택이 그렇다. 이러한 관점을 그대로 옮겨 기술과 정보가 중심이 된 현대를 보여주는 새로운 유형의 건축을 꼽으라면 데이터센터가 아닐까. 정부, 기업, 연구 조직에서 운영되던 전산 공간은 인터넷 상용화와 모바일 기술의 발전을 배경으로 급격히 성장하면서 오늘날 ‘데이터센터’라는 이름의 필수 기반시설이 됐
12 min read

Related categories

Skip carousel

Reviews for Fundamentals of Data Warehouses

Rating: 4 out of 5 stars

4/5

1 rating0 reviews

Book preview

Fundamentals of Data Warehouses - Matthias Jarke

1 Data Warehouse Practice: An Overview

Mattias Jarke¹ , Maurizio Lenzerini² , Yannis Vassiliou³ and Panos Vassiliadis³

(1)

Dept. of Computer Science V, RWTH Aachen, Ahornstraße 55, 52056, Aachen, Germany

(2)

Dipartimento di Informatica e Sistemistica, Università di Roma La Sapienza, Via Saleria 113, 00198, Rome, Italy

(3)

Dept. of Electrical and Computer Engineering, Computer Science Division, National Technical University of Athens, 15773, Zographou, Athens, Greece

Mattias Jarke

Email: [email protected]

Maurizio Lenzerini

Email: [email protected]

Yannis Vassiliou

Email: [email protected]

Since the beginning of data warehousing in the early 1990s, an informal consensus has been reached concerning the major terms and components involved in data warehousing. In this chapter, we first explain the main terms and components. Data warehouse vendors are pursuing different strategies in supporting this basic framework. We review a few of the major product families and the basic problem areas data warehouse practice and research are faced with today.

A data warehouse (DW) is a collection of technologies aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions. It is expected to have the right information in the right place at the right time with the right cost in order to support the right decision. Traditional online transaction processing (OLTP) systems are inappropriate for decision support and high-speed networks cannot, by themselves, solve the information accessibility problem. Data warehousing has become an important strategy to integrate heterogeneous data sources and to enable online analytic processing (OLAP).

A report from the META Group in 1996 predicted data warehousing would be a US$ 13 000 million industry within two years ($8000 million on hardware, $5000 million on services and systems integration), while 1995 represented $ 2000 million in expenditures. In 1998, reality had exceeded these figures, reaching sales of $14 600 million. By 2000, the subsector of OLAP alone exceeded $ 2500 million. Table 1.1 differentiates the trends by product sector.

Table 1.1.

Estimated sales in millions of dollars [ShTy98] (* Estimates are from [PeCr00])

The number and complexity of projects — with project sizes ranging from a few hundred thousand to multiple millions of dollars — is indicative of the difficulty of designing good data warehouses. Their expected duration highlights the need for documented quality goals and change management. The emergence of data warehousing was initially a consequence of the observation by W. Inmon and E. F. Codd in the early 1990s that operational-level online transaction processing (OLTP) and decision support applications (OLAP) cannot coexist efficiently in the same database environment, mostly due to their very different transaction characteristics. Meanwhile, data warehousing has taken a much broader role, especially in the context of reengineering legacy systems or at least saving legacy data. Here, DWs are seen as a strategy to bring heterogeneous data together under a common conceptual and technical umbrella and to make them available for new operational or decision support applications.

A data warehouse caches selected data of interest to a customer group, so that access becomes faster, cheaper, and more effective (Fig. 1.1). As the long-term buffer between OLTP and OLAP, data warehouses face two essential questions: how to reconcile the stream of incoming data from multiple heterogeneous legacy sources, and how to customize the derived data storage to specific OLAP applications. The trade-off driving the design decisions concerning these two issues changes continuously with business needs. Therefore, design support and change management are of greatest importance if we do not want to run DW projects into dead ends.

Fig. 1.1.

Data warehouses: a buffer between transaction processing and analytic processing

Vendors agree that data warehouses cannot be off-the-shelf products but must be designed and optimized with great attention to the customer situation. Traditional database design techniques do not apply since they cannot deal with DW-specific issues such as data source selection, temporal and aggregated data, and controlled redundancy management. Since the wide variety of product and vendor strategies prevents a low-level solution to these design problems at acceptable costs, serious research and development efforts continue to be necessary.

1.1 Data Warehouse Components

Figure 1.2 gives a rough overview of the usual data warehouse components and their relationships. Many researchers and practitioners share the understanding that a data warehouse architecture can be understood as layers of materialized views on top of each other. Since the research problems are largely formulated from this perspective, we begin with a brief summary description.

Fig. 1.2.

A generic data warehouse architecture

A data warehouse architecture exhibits various layers of data in which data from one layer are derived from data of the lower layer. Data sources, also called operational databases, form the lowest layer. They may consist of structured data stored in open database systems and legacy systems or unstructured or semistructured data stored in files. The data sources can be either part of the operational environment of an organization or external, produced by a third party. They are usually heterogeneous, which means that the same data can be represented differently, for instance through different database schemata, in the sources.

The central layer of the architecture is the global data warehouse, sometimes called primary or corporate data warehouse. According to Inmon [Inmo96], it is a collection of integrated, nonvolatile, subject-oriented databases designed to support the decision support system (DSS) function, where each unit of data is relevant to some moment in time, it contains atomic data and lightly summarized data. The global data warehouse keeps a historical record of data. Each time it is changed, a new integrated snapshot of the underlying data sources from which it is derived is placed in line with the previous snapshots. Typically, the data warehouse may contain data that can be many years old (a frequently cited average age is two years). Researchers often assume (realistically) that the global warehouse consists of a set of materialized relational views. These views are defined in terms of other relations that are themselves constructed from the data stored in the sources.

The next layer of views are the local warehouses, which contain highly aggregated data derived from the global warehouse, directly intended to support activities such as informational processing, management decisions, long-term decisions, historical analysis, trend analysis, or integrated analysis. There are various kinds of local warehouses, such as the data marts or the OLAP databases. Data marts are small data warehouses that contain only a subset of the enterprise-wide data warehouse. A data mart may be used only in a specific department and contains only the data which is relevant to this department. For example, a data mart for the marketing department should include only customer, sales, and product information whereas the enterprise-wide data warehouse could also contain information on employees, departments, etc. A data mart enables faster response to queries because the volume of the managed data is much smaller than in the data warehouse and the queries can be distributed between different machines. Data marts may use relational database systems or specific multidimensional data structures.

There are two major differences between the global warehouse and local data marts. First, the global warehouse results from a complex extraction-integration-transformation process. The local data marts, on the other hand, result from an extraction/aggregation process starting from the global warehouse. Second, data in the global warehouse are detailed, voluminous (since the data warehouse keeps data from previous periods of time), and lightly aggregated. On the contrary, data in the local data marts are highly aggregated and less voluminous. This distinction has a number of consequences both in research and in practice, as we shall see throughout the book.

In some cases, an intermediate layer, called an operational data store (ODS), is introduced between the operational data sources and the global data warehouse. An ODS contains subject-oriented, collectively integrated, volatile, current valued, and detailed data. The ODS usually contains records that result from the transformation, integration, and aggregation of detailed data found in the data sources, just as for a global data warehouse. Therefore, we can also consider that the ODS consists of a set of materialized relational views. The main differences with a data warehouse are the following. First, the ODS is subject to change much more frequently than a data warehouse. Second, the ODS only has fresh and current data. Finally, the aggregation in the ODS is of small granularity: for example, the data can be weakly summarized. The use of an ODS, according to Inmon [Inmo96], is justified for corporations that need collective, integrated operational data. The ODS is a good support for activities such as collective operational decisions, or immediate corporate information. This usually depends on the size of the corporation, the need for immediate corporate information, and the status of integration of the various legacy systems. Figure 1.2 summarizes the different layers of data.

All the data warehouse components, processes, and data are — or at least should be — tracked and administered from a metadata repository. The metadata repository serves as an aid both to the administrator and the designer of a data warehouse. Since the data warehouse is a very complex system, its architecture (physical components, schemata) can be complicated; the volume of data is vast; and the processes employed for the extraction, transformation, cleaning, storage, and aggregation of data are numerous, sensitive to changes, and vary in time.

1.2 Designing the Data Warehouse

The design of a data warehouse is a difficult task. There are several problems designers have to tackle. First of all, they have to come up with the semantic reconciliation of the information lying in the sources and the production of an enterprise model for the data warehouse. Then, a logical structure of relations in the core of data warehouse must be obtained, either serving as buffers for the refreshment process or as persistent data stores for querying or further propagation to data marts. This is not a simple task by itself; it becomes even more complicated since the physical design problem arises: the designer has to choose the physical tables, processes, indexes, and data partitions, representing the logical data warehouse schema and facilitating its functionality. Finally, hardware selection and software development is another process that has to be planned from the data warehouse designer [AdVe98, ISIA97, Simo98].

It is evident that the schemata of all the data stores involved in a data warehouse environment change rapidly: the changes of the business rules of a corporation affect both the source schemata (of the operational databases) and the user requirements (and the schemata of the data marts). Consequently, the design of a data warehouse is an ongoing process, which is performed iteratively throughout the lifecycle of the system [KRRT98].

There is quite a lot of discussion about the methodology for the design of a data warehouse. The two major methodologies are the top-down and the bottom-up approaches [Kimb96, KRRT98, Syba97]. In the top-down approach, a global enterprise model is constructed, which reconciles the semantic models of the sources (and later, their data). This approach is usually costly and time-consuming; nevertheless it provides a basis over which the schema of the data warehouse can evolve. The bottom-up approach focuses on the more rapid and less costly development of smaller, specialized data marts and their synthesis as the data warehouse evolves.

No matter which approach is followed, there seems to be agreement on the general idea concerning the final schema of a data warehouse. In a first layer, the ODS serves as an intermediate buffer for the most recent and detailed information from the sources. The data cleaning and transformation is performed at this level. Next, a database under a denormalized star schema usually serves as the central repository of data. A star schema is a special-purpose schema in data warehouses that is oriented towards query efficiency at the cost of schema normalization (cf. Chap. 5 for a detailed description). Finally, more aggregated views on top of this star schema can also be precalculated. The OLAP tools can communicate either with the upper levels of the data warehouse or with the customized data marts: we shall detail this issue in the following sections.

1.3 Getting Heterogeneous Data into the Warehouse

Data warehousing requires access to a broad range of information sources:

Database systems (relational, object-oriented, network, hierarchical, etc.)

External information sources (information gathered from other companies, results of surveys)

Files of standard applications (e.g., Microsoft Excel, COBOL applications)

Other documents (e.g., Microsoft Word, World Wide Web)

Wrappers, loaders, and mediators are programs that load data of the information sources into the data warehouse. Wrappers and loaders are responsible for loading, transforming, cleaning, and updating the data from the sources to the data warehouse. Mediators integrate the data into the warehouse by resolving inconsistencies and conflicts between different information sources. Furthermore, an extraction program can examine the source data to find reasons for conspicuous items, which may contain incorrect information [BaBM97].

These tools — in the commercial sector classified as Extract-Transform-Load (ETL) tools — try to automate or support tasks such as [Gree97]:

Extraction (accessing different source databases)

Cleaning (finding and resolving inconsistencies in the source data)

Transformation (between different data formats, languages, etc.)

Loading (loading the data into the data warehouse)

Replication (replicating source databases into the data warehouse)

Analyzing (e.g., detecting invalid/unexpected values)

High-speed data transfer (important for very large data warehouses)

Checking for data quality, (e.g., for correctness and completeness)

Analyzing metadata (to support the design of a data warehouse)

1.4 Getting Multidimensional Data out of the Warehouse

Relational database management systems (RDBMS) are most flexible when they are used with a normalized data structure. Because normalized data structures are non-redundant, normalized relations are useful for the daily operational work. The database systems used for this role, so called OLTP systems, are optimized to support small transactions and queries using primary keys and specialized indexes.

While OLTP systems store only current information, data warehouses contain historical and summarized data. These data are used by managers to find trends and directions in markets, and supports them in decision making. OLAP is the technology that enables this exploitation of the information stored in the data warehouse.

Due to the complexity of the relationships between the involved entities, OLAP queries require multiple join and aggregation operations over normalized relations, thus overloading the normalized relational database.

Typical operations performed by OLAP clients include [ChDa97]:

Roll up (increasing the level of aggregation)

Drill down (decreasing the level of aggregation)

Slice and dice (selection and projection)

Pivot (reorienting the multidimensional view)

Beyond these basic OLAP operations, other possible client applications on data warehouses include:

Report and query tools

Geographic information systems (GIS)

Data mining (finding patterns and trends in the data warehouse)

Decision support systems (DSS)

Executive information systems (EIS)

Statistics

The OLAP applications provide users with a multidimensional view of the data, which is somewhat different from the typical relational approach; thus their operations need special, customized support. This support is given by multidimensional database systems and relational OLAP servers.

The database management system (DBMS) used for the data warehouse itself and/or for data marts must be a high-performance system, which fulfills the requirements for complex querying demanded by the clients. The following kinds of DBMS are used for data warehousing [Weld97]:

Super-relational database systems

Multidimensional database systems

Super-relational database systems. To make RDBMS more useful for OLAP applications, vendors have added new features to the traditional RDBMS. These so-called super-relational features include support for extensions to storage formats, relational operations, and specialized indexing schemes. To provide fast response time to OLAP applications, the data are organized in a star or snowflake schema (see also Chap. 5).

The resulting data model might be very complex and hard to understand for end users. Vendors of relational database systems try to hide this complexity behind special engines for OLAP. The resulting architecture is called Relational OLAP (ROLAP). In contrast to predictions in the mid-1990s, ROLAP architectures have not been able to capture a large share of the OLAP market. Within this segment, one of the leaders is MicroStrategy [MStr97] whose architecture is shown in Fig. 1.4. The RDBMS is accessed through VLDB (very large databases) drivers, which are optimized for large data warehouses.

Fig. 1.3.

MicroStrategy solution [MStr97]

Fig. 1.4.

MDDB in a data warehouse environment

The DSS Architect translates relational database schemas to an intuitive multidimensional model, so that users are shielded from the complexity of the relational data model. The mapping between the relational and the multidimensional data models is done by consulting the metadata. The system is controlled by the DSS Administrator. With this tool, system administrators can fine-tune the database schema, monitor the system performance, and schedule batch routines.

The DSS Server is a ROLAP server, based on a relational database system. It provides a multidimensional view of the underlying relational database. Other features are the ability to cache query results, the monitoring and scheduling of queries, and generating and maintaining dynamic relational data marts. DSS Agent, DSS Objects, and DSS Web are interfaces to end users, programming languages, or the World Wide Web.

Other ROLAP servers are offered by Red Brick [RBSI97] (subsequently acquired by Informix, then passed on to IBM) and Sybase [Syba97]. The Red Brick system is characterized by an industry-leading indexing and join technology for star schemas (Starjoin); it also includes a data mining option to find patterns, trends, and relationships in very large databases. They argue that data warehouses need to be constructed in an incremental, bottom-up fashion. Therefore, such vendors focus on support of distributed data warehouses and data marts.

Multidimensional database systems (MDDB) support directly the way in which OLAP users visualize and work with data. OLAP requires an analysis of large volumes of complex and interrelated data and viewing that data from various perspectives [Kena95]. MDDB store data in n-dimensional cubes. Each dimension represents a user perspective. For example, the sales data of a company may have the dimensions product, region, and time. Because of the way the data is stored, there are no join operations necessary to answer queries which retrieve sales data by one of these dimensions. Therefore, for OLAP applications, MDDB are often more efficient than traditional RDBMS [Coll96]. A problem with MDDB is that restructuring is much more expensive than in a relational database. Moreover, there is currently no standard data definition language and query language for the multidimensional data model.

In practical multidimensional OLAP products, two market segments can be observed [PeCr00]. At the low end, desktop OLAP systems such as Cognos Power-Play, Business Objects, or Brio focus on the efficient and user-friendly handling of relatively small data cubes on client systems. Here, the MDBS is implemented as a data retailer [Sahi96]: it gets its data from a (relational) data warehouse and offers analysis functionality to end users. As shown in Fig. 1.5, ad-hoc queries are sent directly to the data warehouse, whereas OLAP applications work on the more appropriate, multidimensional data model of the MDDB. Market leaders in this segment support hundreds of thousands of workplaces.

Fig. 1.5.

Example of a DW environment for integrated financial reporting and planning

At the high end, hybrid OLAP (HOLAP) solutions aim to provide full integration of relational data warehouse solutions (aiming at scalability) and multidimensional solutions (aiming at OLAP efficiency) in complex architectures. Market leaders include Hyperion Essbase, Oracle Express, and Microsoft OLAP.

Application-oriented OLAP. As pointed out by Pendse and Creeth [PeCr00], only a few vendors can survive on generic server tools as mentioned above. Many more market niches can be found for specific application domains. Systems in this sector often provide lots of application-specific functionality in addition to (or on top of) multidimensional OLAP (MOLAP) engines. Generally speaking, application domains can be subdivided into four business functions:

Reporting and querying for standard controlling tasks

Problem and opportunity analysis (often called Business Intelligence)

Planning applications

One-of-a-kind data mining campaigns or analysis projects

Two very important application domains are sales analysis and customer relationship management on the one hand, and budgeting, financial reporting, and consolidation on the other. Interestingly, only a few of the tools on the market are able to integrate the reporting and analysis stage for the available data, with the planning tasks for the future.

As an example, Fig. 1.6 shows the b2brain architecture by Thinking Networks AG [Thin01], a MOLAP-based environment for financial reporting and planning data warehouses. It shows some typical features of advanced application-oriented OLAP environments such as efficient custom-tailoring to new applications within a domain using metadata, linkage to heterogeneous sources and clients also via the Internet, and seamless integration of application-relevant features such as heterogeneous data collection, semantics-based consolidation, data mining and planning. Therefore, the architecture demonstrates the variety of physical structures encountered in high-end data warehousing as well as the importance of metadata, both to be discussed in the following subsections.

Fig. 1.6.

Central architecture

1.5 Physical Structure of Data Warehouses

There are three basic architectures for a data warehouse [Weld97, Muck96]:

Centralized

Federated

Tiered

In a centralized architecture, there exists only one data warehouse which stores all data necessary for business analysis. As already shown in the previous section, the disadvantage is the loss of performance compared to distributed approaches. All queries and update operations must be processed in one database system.

On the other hand, access to data is uncomplicated because only one data model is relevant. Furthermore, building and maintaining a central data warehouse is easier than in a distributed environment. A central data warehouse is useful for companies, where the existing operational framework is also centralized (Fig. 1.7).

Fig. 1.7.

Federated architecture

A decentralized architecture is only advantageous if the operational environment is also distributed. In a federated architecture, the data is logically consolidated but stored in separate physical databases at the same or at different physical sites (Fig. 1.8). The local data marts store only the relevant information for a department. Because the amount of data is reduced in contrast to a central data warehouse, the local data mart may contain all levels of detail so that detailed information can also be delivered by the local system.

Fig. 1.8.

Tiered architecture

An important feature of the federated architecture is that the logical warehouse is only virtual. In contrast, in a tiered architecture (Fig. 1.9), the central data warehouse is also physical. In addition to this warehouse, there exist local data marts on different tiers which store copies or summaries of the previous tier but not detailed data as in a federate architecture.

Fig. 1.9.

Distribution of data warehouse project costs [Inmo97]

There can be also different tiers at the source side. Imagine, for example, a super market company collecting data from its branches. This process cannot be done in one step because many sources have to be integrated into the warehouse. On the first level, the data of all branches in one region is collected, and in the second level, the data from the regions is integrated into one data warehouse.

The advantages of the distributed architecture are (a) faster response time because the data is located closer to the client applications and (b) reduced volume of data to be searched. Although, several machines must be used in a distributed architecture, this may result in lower hardware and software costs because not all data must be stored at one place and queries are executed on different machines. A scalable architecture is very important for data warehousing. Data warehouses are not static systems but evolve and grow over time. Because of this, the architecture chosen to build a data warehouse must be easy to extend and to restructure.

1.6 Metadata Management

Metadata play an important role in data warehousing. Before a data warehouse can be accessed efficiently, it is necessary to understand what data is available in the warehouse and where is the data located In addition to locating the data that the end users require, metadata repositories may contain [AdCo97, MStr95, Micr96]:

Data dictionary: contains definitions of the databases being maintained and the relationships between data elements

Data flow: direction and frequency of data feed

Data transformation: transformations required when data is moved

Version control: changes to metadata are stored

Data usage statistics: a profile of data in the warehouse

Alias information: alias names for a field

Security: who is allowed to

Enjoying the preview?

Page 1 of 1

Fundamentals of Data Warehouses

About this ebook

Matthias Jarke

Related authors

Related to Fundamentals of Data Warehouses

Related ebooks

Computer Science and Ambient Intelligence

Predictive Maintenance in Smart Factories: Architectures, Methodologies, and Use-cases

Requirements Engineering

Industrial Sensors and Controls in Communication Networks: From Wired Technologies to Cloud Computing and the Internet of Things

Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data

Embedded Systems: Analysis and Modeling with SysML, UML and AADL

A Framework for Visualizing Information

Application of FPGA to Real‐Time Machine Learning: Hardware Reservoir Computers and Software Image Processing

Spatio-temporal Design: Advances in Efficient Data Acquisition

Cognitive Radio Communication and Networking: Principles and Practice

Failure Analysis: A Practical Guide for Manufacturers of Electronic Components and Systems

Pipelined Processor Farms: Structured Design for Embedded Parallel Systems

Multicriteria Portfolio Construction with Python

Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data

Knowledge Discovery with Support Vector Machines

Remoting Patterns: Foundations of Enterprise, Internet and Realtime Distributed Object Middleware

Non-volatile Memories

Statistical Monitoring of Complex Multivatiate Processes: With Applications in Industrial Process Control

IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning: Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML/PKDD 2020, Ghent, Belgium, September 14-18, 2020, Revised Selected Papers

Application Design: Key Principles For Data-Intensive App Systems

Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control

Distibuted Systems: Design and Algorithms

Evolutionary Algorithms and Neural Networks: Theory and Applications

Validated Numerics: A Short Introduction to Rigorous Computations

Mining Over Air: Wireless Communication Networks Analytics

Data-Driven AI Architectures

Live Trace Visualization for System and Program Comprehension in Large Software Landscapes

Cognitive Computing and Big Data Analytics

Communication Nets: Stochastic Message Flow and Delay

Domain-Specific Knowledge Graph Construction

Databases For You

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

Access 2010 All-in-One For Dummies

SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL

COMPUTER SCIENCE FOR ROOKIES

Practical Data Analysis

Blockchain Basics: A Non-Technical Introduction in 25 Steps

Learn SQL in 24 Hours

CompTIA DataSys+ Study Guide: Exam DS0-001

Grokking Algorithms: An illustrated guide for programmers and other curious people

Access 2019 For Dummies

Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary

Learn Git in a Month of Lunches

Serverless Architectures on AWS, Second Edition

The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation

Visual Basic 6.0 Programming By Examples

Troubleshooting PostgreSQL

Learn SQL Server Administration in a Month of Lunches

Go in Action

Visualizing Graph Data

Data Analysis with R

Starting Database Administration: Oracle DBA

MATLAB Machine Learning Recipes: A Problem-Solution Approach

Professional ADO.NET 3.5 with LINQ and the Entity Framework

Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing

Python Projects for Everyone

Developing Analytic Talent: Becoming a Data Scientist

Dark Data: Why What You Don’t Know Matters

Learn dbatools in a Month of Lunches: Automating SQL server tasks with PowerShell commands

Artificial Intelligence Basics: A Non-Technical Introduction

R: Recipes for Analysis, Visualization and Machine Learning

Related podcast episodes

Related articles

Related categories

Reviews for Fundamentals of Data Warehouses

What did you think?

Book preview

Fundamentals of Data Warehouses - Matthias Jarke

1

Data Warehouse Practice: An Overview

1.1 Data Warehouse Components

1.2 Designing the Data Warehouse

1.3 Getting Heterogeneous Data into the Warehouse

1.4 Getting Multidimensional Data out of the Warehouse