Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Data Virtualization: Selected Writings
Data Virtualization: Selected Writings
Data Virtualization: Selected Writings
Ebook342 pages2 hours

Data Virtualization: Selected Writings

Rating: 0 out of 5 stars

()

Read preview

About this ebook

With data virtualization, organizations can unlock the data stored in their multitude of databases and systems more easily and more quickly than with most other technologies. It can integrate data from multiple sources and can deliver that data in all kinds of forms and shapes to many different types of data consumption.
This book is a bundling of articles, blogs, and whitepapers Rick F. van der Lans wrote on data virtualization and related topics over the last ten years.
The author is a highly-respected independent analyst, consultant, author, and internationally acclaimed lecturer specializing in data warehousing, business intelligence, big data, database technology, and data virtualization. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com. He has been influential in introducing data virtualization worldwide. In 2012 he published “Data Virtualization for Business Intelligence Systems.”
LanguageEnglish
PublisherLulu.com
Release dateSep 10, 2019
ISBN9780359894611
Data Virtualization: Selected Writings

Related to Data Virtualization

Related ebooks

Computers For You

View More

Related articles

Reviews for Data Virtualization

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Data Virtualization - Rick F. van der Lans

    Data Virtualization: Selected Writings

    Data Virtualization: Selected Writings

    Author: Rick F. van der Lans

    Edited by Diane Cools

    Cover design by Rudy VanderLans

    Copyright

    Copyright © 2019 by Rick F. van der Lans

    All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form of by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the author.

    Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the author was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

    The author has taken care in the preparation of this book, but makes no ex-pressed or implied warranty of any kind and assumes no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.

    First Printing: 2019

    ISBN: 978-0-359-89461-1

    Publisher: Lulu Press, Inc. – www.lulu.com

    Website author: www.r20.nl

    Preface

    Introduction

    Data virtualization technology allows organizations to quickly unlock the data they have stored in their multitude of databases and systems. Data virtualization can integrate data from multiple sources and can deliver that data in all kinds of forms and shapes to many different types of data consumption, ranging from simple dashboards and reports via Java apps running on a mobile device to advanced forms of analytics initiated by data scientists.

    Now that organizations want to become more data driven and are on their digital transformation journey, being able to exploit the data is the key to success. Unfortunately, because most of the data is deeply buried in complex applications and stored in intricate database structures, it is often not available when organizations need it. This is where data virtualization comes in. It makes it possible to unlock all the data more easily and more quickly than with most other technologies.

    This Book

    In 2012, I published a book[1] on data virtualization entitled Data Virtualization for Business Intelligence Systems. It describes what data virtualization is, what the pros and cons are, how the tools work internally, and what possible use cases exist. Since the book was published, much has changed. Our insights on data virtualization have changed, the market has changed, we have much more practical experience with the tools, the tools themselves have matured even further, and the technology is being used in larger projects. In short, there is much more known about data virtualization. Therefore, I felt it was time to write a book containing more up-to-date information, which can be seen as an addendum to my original book.

    Instead of writing the book from scratch, I decided to use existing material. Therefore, the book is a bundling of articles, blogs, and whitepapers I wrote on the data virtualization topic over the last ten years. Some are included in their original form, some have been updated, and some are combined or shortened to fit this book.


    [1] R.F. van der Lans, Data Virtualization for Business Intelligence Systems, Morgan Kaufmann Publishers, 2012

    Why the Title ‘Selected Writings?’

    Most likely, many musicians have heroes, book authors have heroes, and sports people have heroes. I have a hero as well, a database hero. From the first day I started in IT, my hero is C.J. (Chris) Date. He is the author of numerous books on databases and he has had an enormous impact on the adoption of relational databases in the market. I have always admired his work. The first book I worked on was a translation of one of his early books[2] Database: A Primer. Right away, it impressed me how clear, educational, and well-structured his writing was.

    I have read many of his books and enjoyed especially the books in the Selected Writings series, such as Relational Database: Selected Writings which was first published in 1986[3]. To honor his writing I decided to follow his example, hence the title Data Virtualization: Selected Writings.


    [2] C.J. Date, Database: A Primer, Addison-Wesley, November 1983

    [3] C.J. Date, Relational Database: Selected Writings, Addison-Wesley, first edition, March 1986

    Original Articles and Blogs?

    All the articles have been adapted to some degree. Because they were written over a time period of ten years, the terminology was not always used consistently, so I changed that. In some articles passages were removed because they overlapped too much with others. Originally, all the articles had to stand on themselves, so some contained introductory text. I removed most of those pieces of texts to avoid repetition. Reading an introduction fifteen times is no fun. Some articles I enriched with extra notes to increase the value of the article. I have also added comments to the articles to explain why I incorporated them or to clarify some of the remarks made or concepts introduced in the article.

    Why Now?

    The first real article I wrote on data virtualization was published in 2009, exactly ten years ago. In all these intervening years I wrote countless articles, blogs, and whitepapers on this and related topics. As indicated, I even wrote a book on data virtualization. I thought that ten years was a good excuse to bundle a selection of these writings.

    Moreover, it feels as if the market has adopted data virtualization. It has taken some time, but finally, companies are studying, testing, and deploying this technology. Hopefully, making many of these writings easily available, helps the adopters with some of the questions they struggle with.

    The third reason I decided to publish this book now, is that some of the older articles and blogs are no longer available online or otherwise; now they are.

    My History with Data Virtualization

    The first time I was confronted with data virtualization was in the second half of the 1990s. At that time I worked for a large oil company. In one of the projects we experimented with two tools that, with hindsight, can be regarded as the forerunners of today’s data virtualization products: Information Builders’ EDA/SQL and IBM’s DB2 Data Joiner. The two tools tried to accomplish the same thing: turn a heterogeneous set of independent databases into one logical database. In other words, they presented multiple databases to the developers as one integrated database and thus simplifying development work. In our project we tried to make it easy for developers of applications and reports to extract data from databases, such as Oracle on VAX/VMS and DB2 on the IBM mainframe. At the end of the project, the general conclusion was: the concept is great, it has adequate functionality, but the performance and scalability were unacceptable. It was too early for data virtualization.

    Fast forward to 2005. In that year I visited the San Francisco Bay Area to meet with some vendors to discuss their new technologies. One of them was Composite Software based in San Mateo. This was the time when everyone was into mashups, applications that could bring together data from several websites. To my pleasant surprise, Composite Information Server turned out to be a data virtualization product. So, almost ten years later I was again confronted with this type of integration technology. And it piqued my interest. I decided to study the market and discovered that more vendors, all for different reasons, had gone back to this idea of data virtualization. Besides Composite, others such as Denodo, Ipedo, and MetaMatrix, all offered data virtualization products. This was the signal for me to investigate the maturity of these products. The conclusion was that compared to the products I had worked with in the 1990s, they had made a giant step.

    However, in 2005 the vendors were not really pushing their products yet. They waited until 2008/2009. Unfortunately, it coincided with the global economic downturn. Few organizations could afford to invest in new technology. Many were just trying to survive. The timing for introducing new technology was far from perfect.

    Fortunately, the tide has turned. Currently, organizations know what data virtualization can mean for them. They acknowledge the advantages and are investing in the technology. Products are being selected, systems are being built, and the technology is being used by all kinds of organizations, commercial and non-commercial, small and large.

    For Whom Is This Book Intended?

    This book is recommend to the following groups of people:

    Business intelligence specialists who are responsible for developing and managing a data warehouse and business intelligence environment; for those who want to know how such systems can be simplified by applying data virtualization and how data virtualization can lead to a more agile business intelligence system.

    Information management specialists who want to know what the effect of data virtualization is on their profession, and how it will impact activities such as information management, data governance, database design, data cleansing, and data profiling.

    Master data management specialists who are responsible for setting up a master data management system and want to know how they can benefit from deploying data virtualization.

    Data architects who are responsible for designing an overall architecture for data delivery to any part of the organization.

    Designers, analysts, and consultants who need to deal, directly or indirectly, with data virtualization and want to know about its possibilities and impossibilities

    IT students who want to know what data virtualization is and what the differences are with other data-related technologies.

    Prerequisite Knowledge

    General knowledge of topics such as data warehousing, business intelligence, and database technology is required.

    Structure of the Book

    As indicated, this book is a collection of articles and blogs. They are not included in order of publication. They have been grouped based on their topic. The selected topics are:

    Introduction to data virtualization

    Use cases of data virtualization

    The data delivery platform

    The logical data warehouse

    The unified data delivery platform

    Thank You

    There are many I need to thank for their help, contributions, ideas, comments, mental support, and patience.

    I like to thank the following data virtualization vendors who, over the years, invited me to write whitepapers for them: Denodo Technologies, fraXses, Informatica Corporation, Red Hat, Rocket Software, Stone Bond Technologies, and TIBCO Software. Writing whitepapers forces me to think harder about a technology, to analyze what the pros and cons are, to determine what the potential use cases are, and it forces me to structure my knowledge. This has definitely led to a better understanding of data virtualization. I also like to thank them for answering all my countless technical questions.

    In my first book on data virtualization I thanked three vendors because they supported me in the beginning extremely well. Therefore, I like to thank them again, because I still benefit from their cooperation. These three vendors were Composite Software, Denodo Technologies, and Informatica Corporation. Especially the following specialists I am still very grateful: David Besemer, Robert Eve, Kevin O’Brien, Ian Pestell, and Jean-Philippe Player of Composite Software; Suresh Chandrasekaran, Juan Lozano, and Alberto Pan of Denodo Technologies; and Diby Malakar, James Markarian, Bert Oosterhof, Ash Parikh, and Lalitha Sundaramurthy of Informatica. Note that some of these vendors and specialists are no longer involved in data virtualization; Informatica doesn’t really sell data virtualization technology any longer, and the Composite Information Server product is now being sold by TIBCO Software under the name TIBCO Data Virtualization.

    One person stands out in this list, someone I want to thank separately: Robert Eve, Bob for friends. Many of the people who were involved in data virtualization at the time when I got involved, now do other things. A few have stayed, Bob is one. Especially in those first five years when data virtualization wasn’t that popular, he worked hard to sell the idea. And because of that we had numerous discussions on the topic and he has helped me with some of my writings. Thanks very much Bob for working together on this journey in all these years. But I don’t think we are done yet.

    I am very grateful to the thousands of people across the world who, over the past ten years, attended my seminars and sessions at conferences on data virtualization and related topics, such as the logical data warehouse. Their comments and recommendations have been invaluable for writing my blogs and articles and indirectly for this book.

    I also like to thank the following online platforms for allowing me to republish my writings: TechTarget.com, Datamanager.it, Denodo.com, and IRMConnects.com. And also I like to thank Denodo Technologies, fraXses, Red Hat, Rocket Software, and TIBCO Software, for allowing me to reuse material from the whitepapers I wrote for them. Thanks also to Antoine Stelma for helping me out with some of the screenshots.

    Finally, the person I must thank is my ‘personal editor’, but foremost my wife, Diane Cools. Already, we’ve worked on more than ten books together and hundreds of other writings and it is still great to work with her after all these years. No publication goes out without her stamp of approval. As always, I’m very grateful, Diane!

    Finally, I like to ask readers to send comments, opinions, ideas, and suggestions concerning the contents of the book to [email protected], referencing Data Virtualization: Selected Writings. Many thanks in anticipation of your cooperation. I hope you have as much fun reading this book as I had writing the articles, blogs, and this book.

    Rick F. van der Lans

    Lisse, The Netherlands, 3 September 2019

    1  Introduction to Data Virtualization

    This first chapter contains writings in which data virtualization is explained. The pros, cons, and internal workings are described; the differences between the term data virtualization and other related ones, such as data federation and data integration, are explained; and the impact on productivity and maintenance are clarified.

    The chapter ends with an article entitled ‘Data virtualization and the Fulfilling of Ted Codd’s Dream.’ Edgar F. (Ted) Codd was the inventor of the relational model. In one of his groundbreaking articles he wrote that the primary objective of the relational model is the data independence objective. He pleaded for the development of IT systems in which data storage and data access aspects were decoupled from the application structure. A decoupled solution means that changes can be made to one without impacting the other. In other words, changes made to, for example, the data storage layer, have no impact on the application layer, and vice versa. This decoupling leads to productivity improvement and the ease of maintenance.

    Often we refer to decoupling with the term abstraction. And that is what data virtualization is all about: abstraction. Data virtualization is about shielding applications and users from how and where data is stored and accessed. So, in a way, data virtualization fulfills a part of Ted Codd’s dream by supporting the data independence objectivity.

    1.1  What is Data Virtualization?

    I included this article because I wanted the book to start with a high-level introduction to data virtualization. It was published in August 2011 on IRMUK’s website called IRMconnects.com. The article has been adapted slightly. For example, some old product names were updated. But broadly speaking, it is still the same article.

    Introduction to Virtualization

    Data virtualization is receiving more and more attention in the IT industry, especially from those interested in data management and business intelligence. This increased interest is well-deserved, because data virtualization has a unifying impact on how applications manage, handle, and access the many disparate data sources in an organization. But what exactly is it and

    Enjoying the preview?
    Page 1 of 1