Data Virtualization: Selected Writings
()
About this ebook
This book is a bundling of articles, blogs, and whitepapers Rick F. van der Lans wrote on data virtualization and related topics over the last ten years.
The author is a highly-respected independent analyst, consultant, author, and internationally acclaimed lecturer specializing in data warehousing, business intelligence, big data, database technology, and data virtualization. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com. He has been influential in introducing data virtualization worldwide. In 2012 he published “Data Virtualization for Business Intelligence Systems.”
Related to Data Virtualization
Related ebooks
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools Rating: 0 out of 5 stars0 ratingsData Fluency: Empowering Your Organization with Effective Data Communication Rating: 2 out of 5 stars2/5Power Query for Power BI and Excel Rating: 0 out of 5 stars0 ratingsBig Data: Opportunities and challenges Rating: 0 out of 5 stars0 ratingsData Analysis and Harmonization: A Simple Guide Rating: 0 out of 5 stars0 ratingsCustomer Data Platforms Second Edition Rating: 0 out of 5 stars0 ratingsData Architecture A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsSelf-Service Data & Analytics Third Edition Rating: 0 out of 5 stars0 ratingsChange data capture Third Edition Rating: 0 out of 5 stars0 ratingsEnterprise Architecture Turnaround Rating: 0 out of 5 stars0 ratingsBi Tools A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsCustomer Data Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsEnterprise Data Warehouse Third Edition Rating: 0 out of 5 stars0 ratingsTesting the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality Rating: 0 out of 5 stars0 ratingsData Teams: A Unified Management Model for Successful Data-Focused Teams Rating: 0 out of 5 stars0 ratingsData Mesh: Transforming Data Architecture for Decentralized and Scalable Insights Rating: 0 out of 5 stars0 ratingsData Warehouse Architecture A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsData architect A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsMaster data management Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsEmpowered by Data: How to Build Inspired Analytics Communities Rating: 0 out of 5 stars0 ratingsMaking Big Data Work for Your Business: A guide to effective Big Data analytics Rating: 0 out of 5 stars0 ratingsAzure Data Factory Cookbook: A data engineer's guide to building and managing ETL and ELT pipelines with data integration Rating: 0 out of 5 stars0 ratingsThe Data Model Resource Book: Volume 3: Universal Patterns for Data Modeling Rating: 0 out of 5 stars0 ratingsApplying Data Modeling A Complete Guide Rating: 0 out of 5 stars0 ratingsMicrosoft SQL Server 2014 Business Intelligence Development Beginner’s Guide Rating: 0 out of 5 stars0 ratingsAWS Data Analytics: Unleashing the Power of Data: Insights and Solutions with AWS Analytics Rating: 0 out of 5 stars0 ratingsAgile Product Management A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsBig Data Architecture A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsManaging Blind: A Data Quality and Data Governance Vade Mecum Rating: 0 out of 5 stars0 ratingsData Management Strategy A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratings
Computers For You
The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Elon Musk Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5An Ultimate Guide to Kali Linux for Beginners Rating: 3 out of 5 stars3/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsThe ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Tor and the Dark Art of Anonymity Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5The Best Hacking Tricks for Beginners Rating: 4 out of 5 stars4/5Discord For Dummies Rating: 0 out of 5 stars0 ratingsI Forced a Bot to Write This Book: A.I. Meets B.S. Rating: 4 out of 5 stars4/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratings
Reviews for Data Virtualization
0 ratings0 reviews
Book preview
Data Virtualization - Rick F. van der Lans
Data Virtualization: Selected Writings
Author: Rick F. van der Lans
Edited by Diane Cools
Cover design by Rudy VanderLans
Copyright
Copyright © 2019 by Rick F. van der Lans
All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form of by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the author.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the author was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.
The author has taken care in the preparation of this book, but makes no ex-pressed or implied warranty of any kind and assumes no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.
First Printing: 2019
ISBN: 978-0-359-89461-1
Publisher: Lulu Press, Inc. – www.lulu.com
Website author: www.r20.nl
Preface
Introduction
Data virtualization technology allows organizations to quickly unlock the data they have stored in their multitude of databases and systems. Data virtualization can integrate data from multiple sources and can deliver that data in all kinds of forms and shapes to many different types of data consumption, ranging from simple dashboards and reports via Java apps running on a mobile device to advanced forms of analytics initiated by data scientists.
Now that organizations want to become more data driven and are on their digital transformation journey, being able to exploit the data is the key to success. Unfortunately, because most of the data is deeply buried in complex applications and stored in intricate database structures, it is often not available when organizations need it. This is where data virtualization comes in. It makes it possible to unlock all the data more easily and more quickly than with most other technologies.
This Book
In 2012, I published a book[1] on data virtualization entitled Data Virtualization for Business Intelligence Systems. It describes what data virtualization is, what the pros and cons are, how the tools work internally, and what possible use cases exist. Since the book was published, much has changed. Our insights on data virtualization have changed, the market has changed, we have much more practical experience with the tools, the tools themselves have matured even further, and the technology is being used in larger projects. In short, there is much more known about data virtualization. Therefore, I felt it was time to write a book containing more up-to-date information, which can be seen as an addendum to my original book.
Instead of writing the book from scratch, I decided to use existing material. Therefore, the book is a bundling of articles, blogs, and whitepapers I wrote on the data virtualization topic over the last ten years. Some are included in their original form, some have been updated, and some are combined or shortened to fit this book.
[1] R.F. van der Lans, Data Virtualization for Business Intelligence Systems, Morgan Kaufmann Publishers, 2012
Why the Title ‘Selected Writings?’
Most likely, many musicians have heroes, book authors have heroes, and sports people have heroes. I have a hero as well, a database hero. From the first day I started in IT, my hero is C.J. (Chris
) Date. He is the author of numerous books on databases and he has had an enormous impact on the adoption of relational databases in the market. I have always admired his work. The first book I worked on was a translation of one of his early books[2] Database: A Primer. Right away, it impressed me how clear, educational, and well-structured his writing was.
I have read many of his books and enjoyed especially the books in the Selected Writings series, such as Relational Database: Selected Writings which was first published in 1986[3]. To honor his writing I decided to follow his example, hence the title Data Virtualization: Selected Writings.
[2] C.J. Date, Database: A Primer, Addison-Wesley, November 1983
[3] C.J. Date, Relational Database: Selected Writings, Addison-Wesley, first edition, March 1986
Original Articles and Blogs?
All the articles have been adapted to some degree. Because they were written over a time period of ten years, the terminology was not always used consistently, so I changed that. In some articles passages were removed because they overlapped too much with others. Originally, all the articles had to stand on themselves, so some contained introductory text. I removed most of those pieces of texts to avoid repetition. Reading an introduction fifteen times is no fun. Some articles I enriched with extra notes to increase the value of the article. I have also added comments to the articles to explain why I incorporated them or to clarify some of the remarks made or concepts introduced in the article.
Why Now?
The first real article I wrote on data virtualization was published in 2009, exactly ten years ago. In all these intervening years I wrote countless articles, blogs, and whitepapers on this and related topics. As indicated, I even wrote a book on data virtualization. I thought that ten years was a good excuse to bundle a selection of these writings.
Moreover, it feels as if the market has adopted data virtualization. It has taken some time, but finally, companies are studying, testing, and deploying this technology. Hopefully, making many of these writings easily available, helps the adopters with some of the questions they struggle with.
The third reason I decided to publish this book now, is that some of the older articles and blogs are no longer available online or otherwise; now they are.
My History with Data Virtualization
The first time I was confronted with data virtualization was in the second half of the 1990s. At that time I worked for a large oil company. In one of the projects we experimented with two tools that, with hindsight, can be regarded as the forerunners of today’s data virtualization products: Information Builders’ EDA/SQL and IBM’s DB2 Data Joiner. The two tools tried to accomplish the same thing: turn a heterogeneous set of independent databases into one logical database. In other words, they presented multiple databases to the developers as one integrated database and thus simplifying development work. In our project we tried to make it easy for developers of applications and reports to extract data from databases, such as Oracle on VAX/VMS and DB2 on the IBM mainframe. At the end of the project, the general conclusion was: the concept is great, it has adequate functionality, but the performance and scalability were unacceptable. It was too early for data virtualization.
Fast forward to 2005. In that year I visited the San Francisco Bay Area to meet with some vendors to discuss their new technologies. One of them was Composite Software based in San Mateo. This was the time when everyone was into mashups, applications that could bring together data from several websites. To my pleasant surprise, Composite Information Server turned out to be a data virtualization product. So, almost ten years later I was again confronted with this type of integration technology. And it piqued my interest. I decided to study the market and discovered that more vendors, all for different reasons, had gone back to this idea of data virtualization. Besides Composite, others such as Denodo, Ipedo, and MetaMatrix, all offered data virtualization products. This was the signal for me to investigate the maturity of these products. The conclusion was that compared to the products I had worked with in the 1990s, they had made a giant step.
However, in 2005 the vendors were not really pushing their products yet. They waited until 2008/2009. Unfortunately, it coincided with the global economic downturn. Few organizations could afford to invest in new technology. Many were just trying to survive. The timing for introducing new technology was far from perfect.
Fortunately, the tide has turned. Currently, organizations know what data virtualization can mean for them. They acknowledge the advantages and are investing in the technology. Products are being selected, systems are being built, and the technology is being used by all kinds of organizations, commercial and non-commercial, small and large.
For Whom Is This Book Intended?
This book is recommend to the following groups of people:
Business intelligence specialists who are responsible for developing and managing a data warehouse and business intelligence environment; for those who want to know how such systems can be simplified by applying data virtualization and how data virtualization can lead to a more agile business intelligence system.
Information management specialists who want to know what the effect of data virtualization is on their profession, and how it will impact activities such as information management, data governance, database design, data cleansing, and data profiling.
Master data management specialists who are responsible for setting up a master data management system and want to know how they can benefit from deploying data virtualization.
Data architects who are responsible for designing an overall architecture for data delivery to any part of the organization.
Designers, analysts, and consultants who need to deal, directly or indirectly, with data virtualization and want to know about its possibilities and impossibilities
IT students who want to know what data virtualization is and what the differences are with other data-related technologies.
Prerequisite Knowledge
General knowledge of topics such as data warehousing, business intelligence, and database technology is required.
Structure of the Book
As indicated, this book is a collection of articles and blogs. They are not included in order of publication. They have been grouped based on their topic. The selected topics are:
Introduction to data virtualization
Use cases of data virtualization
The data delivery platform
The logical data warehouse
The unified data delivery platform
Thank You
There are many I need to thank for their help, contributions, ideas, comments, mental support, and patience.
I like to thank the following data virtualization vendors who, over the years, invited me to write whitepapers for them: Denodo Technologies, fraXses, Informatica Corporation, Red Hat, Rocket Software, Stone Bond Technologies, and TIBCO Software. Writing whitepapers forces me to think harder about a technology, to analyze what the pros and cons are, to determine what the potential use cases are, and it forces me to structure my knowledge. This has definitely led to a better understanding of data virtualization. I also like to thank them for answering all my countless technical questions.
In my first book on data virtualization I thanked three vendors because they supported me in the beginning extremely well. Therefore, I like to thank them again, because I still benefit from their cooperation. These three vendors were Composite Software, Denodo Technologies, and Informatica Corporation. Especially the following specialists I am still very grateful: David Besemer, Robert Eve, Kevin O’Brien, Ian Pestell, and Jean-Philippe Player of Composite Software; Suresh Chandrasekaran, Juan Lozano, and Alberto Pan of Denodo Technologies; and Diby Malakar, James Markarian, Bert Oosterhof, Ash Parikh, and Lalitha Sundaramurthy of Informatica. Note that some of these vendors and specialists are no longer involved in data virtualization; Informatica doesn’t really sell data virtualization technology any longer, and the Composite Information Server product is now being sold by TIBCO Software under the name TIBCO Data Virtualization.
One person stands out in this list, someone I want to thank separately: Robert Eve, Bob for friends. Many of the people who were involved in data virtualization at the time when I got involved, now do other things. A few have stayed, Bob is one. Especially in those first five years when data virtualization wasn’t that popular, he worked hard to sell the idea. And because of that we had numerous discussions on the topic and he has helped me with some of my writings. Thanks very much Bob for working together on this journey in all these years. But I don’t think we are done yet.
I am very grateful to the thousands of people across the world who, over the past ten years, attended my seminars and sessions at conferences on data virtualization and related topics, such as the logical data warehouse. Their comments and recommendations have been invaluable for writing my blogs and articles and indirectly for this book.
I also like to thank the following online platforms for allowing me to republish my writings: TechTarget.com, Datamanager.it, Denodo.com, and IRMConnects.com. And also I like to thank Denodo Technologies, fraXses, Red Hat, Rocket Software, and TIBCO Software, for allowing me to reuse material from the whitepapers I wrote for them. Thanks also to Antoine Stelma for helping me out with some of the screenshots.
Finally, the person I must thank is my ‘personal editor’, but foremost my wife, Diane Cools. Already, we’ve worked on more than ten books together and hundreds of other writings and it is still great to work with her after all these years. No publication goes out without her stamp of approval. As always, I’m very grateful, Diane!
Finally, I like to ask readers to send comments, opinions, ideas, and suggestions concerning the contents of the book to [email protected], referencing Data Virtualization: Selected Writings. Many thanks in anticipation of your cooperation. I hope you have as much fun reading this book as I had writing the articles, blogs, and this book.
Rick F. van der Lans
Lisse, The Netherlands, 3 September 2019
1 Introduction to Data Virtualization
This first chapter contains writings in which data virtualization is explained. The pros, cons, and internal workings are described; the differences between the term data virtualization and other related ones, such as data federation and data integration, are explained; and the impact on productivity and maintenance are clarified.
The chapter ends with an article entitled ‘Data virtualization and the Fulfilling of Ted Codd’s Dream.’ Edgar F. (Ted
) Codd was the inventor of the relational model. In one of his groundbreaking articles he wrote that the primary objective of the relational model is the data independence objective. He pleaded for the development of IT systems in which data storage and data access aspects were decoupled from the application structure. A decoupled solution means that changes can be made to one without impacting the other. In other words, changes made to, for example, the data storage layer, have no impact on the application layer, and vice versa. This decoupling leads to productivity improvement and the ease of maintenance.
Often we refer to decoupling with the term abstraction. And that is what data virtualization is all about: abstraction. Data virtualization is about shielding applications and users from how and where data is stored and accessed. So, in a way, data virtualization fulfills a part of Ted Codd’s dream by supporting the data independence objectivity.
1.1 What is Data Virtualization?
I included this article because I wanted the book to start with a high-level introduction to data virtualization. It was published in August 2011 on IRMUK’s website called IRMconnects.com. The article has been adapted slightly. For example, some old product names were updated. But broadly speaking, it is still the same article.
Introduction to Virtualization
Data virtualization is receiving more and more attention in the IT industry, especially from those interested in data management and business intelligence. This increased interest is well-deserved, because data virtualization has a unifying impact on how applications manage, handle, and access the many disparate data sources in an organization. But what exactly is it and