0% found this document useful (0 votes)
16 views4 pages

Measuring_the_Data_Universe_Data_Integration_Using..._----_(2. What_Does_Reality_Look_Like_)

The document discusses the challenges of data collection and integration, highlighting that despite an abundance of data, significant gaps remain due to a lack of organization and standards. It emphasizes the need for expertise in data analysis to effectively utilize complex data sets, as technology alone cannot bridge these gaps. The authors argue that a systematic approach to data management is essential for improving decision-making and addressing critical information needs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

Measuring_the_Data_Universe_Data_Integration_Using..._----_(2. What_Does_Reality_Look_Like_)

The document discusses the challenges of data collection and integration, highlighting that despite an abundance of data, significant gaps remain due to a lack of organization and standards. It emphasizes the need for expertise in data analysis to effectively utilize complex data sets, as technology alone cannot bridge these gaps. The authors argue that a systematic approach to data management is essential for improving decision-making and addressing critical information needs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Copyright © 2018. Springer. All rights reserved.

Ebook pages 16-21 | Printed page 1 of 4

© Springer International Publishing AG, part of Springer Nature 2018


Reinhold Stahl and Patricia Staab, Measuring the Data Universe
https://doi.org/10.1007/978-3-319-76989-9_2

2. What Does Reality Look Like?

Reinhold Stahl1 and Patricia Staab2

(1) Dornburg, Germany


(2) Frankfurt, Germany

Reinhold Stahl (Corresponding author)

Patricia Staab

Abstract
Data are collected fervently but not wisely: often, not what is needed but what arises is gathered.
Therefore, despite a tsunami of data, painful data gaps still exist.
When data sets do not fit together, the potential within them cannot be exploited. Nevertheless, the
information industry has neither a system of order nor any comprehensive standard for data. This
deficiency explains why firms launch countless data warehousing, business intelligence (BI) or Big Data
projects, and why they appoint multiple chief data officers whose primary task is usually to bring overall
order to the companies’ own data.
At the end of the day, the gap will not be closed by the use, however massive, of technology alone.
Data analysis on demand on data sets with dozens, or hundreds, of dimensions is not possible without
consciousness, intelligence and expertise.

2.1 Yawning Data Gaps Despite “Collectomania”


All over the world data are eagerly, zealously, collected at every opportunity. However, there is reason
to suspect that this effort is not concentrated where information is urgently needed but rather where it
can be collected easily. This is why, whenever crises erupt and analyses would be helpful, data gaps are
still being identified and criticised. During the 2016 refugee crisis in Germany, for example, data gaps
concerning the vacancy rates of houses and flats were lamented, despite numerous data collections
already existing on real estate purchases and rental prices.

Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.
Copyright © 2018. Springer. All rights reserved. Ebook pages 16-21 | Printed page 2 of 4

Another example is the current debate on the possibly carcinogenic effect of the broad-spectrum
herbicide glyphosate used for weed control. During this discussion it emerged that knowledge about the
geographical distribution of diseases is incomplete. Therefore, no investigations can be carried out on
correlations of (frequent) incidents of disease with the location of potential sources of danger, such as
the application areas of hazardous substances in agriculture or on railway tracks, the sites of power
plants or emission-intensive factories and traffic points. A similar question is whether studies on the
prevalence of skin cancers could gain insight if data from medicine and meteorology (solar radiation and
intensity, hours of sunshine by geographic allocation, etc.) were combined.
Finally, the new international institutions for monitoring financial stability created in the aftermath
of various financial crises, such as the G20 countries’ Financial Stability Board, have identified a whole
series of data gaps in the data sets for the financial and real economies. Currently, efforts are being taken
worldwide to fill those gaps—to be better informed is to be better prepared.

2.2 The Data Universe Lacks Order


The gospel of data analysts starts with the words “In the beginning was the data”. The data are the
original building blocks, the atoms in our universe and the starting point of our work. They can, at the
same time, be our greatest good and our most terrible curse. Because when they do not fit together, they
are worthless.
Thus, the bar is set high—and the exploding data world described in this chapter is very far from this
benchmark. The comparison with reality shows that the information industry lags behind other branches
of industry or scientific disciplines with regard to standardisation and organisation of their most precious
asset—the data. For there is neither a system of order for data and information nor a prominent
standardisation, and certainly no “periodic system of elements” as in the natural sciences. Nowhere do
we find anything approaching a unique identifier, a universal barcode for information. This could give
the impression that Google itself is the actual system of order. This lack of data organisation is deplored
within and across very different business branches or scientific fields, for example the lack of
standardisation of data in genetic plant research (see Goff et al. 2011).
One might argue that this dishevelled state is to be expected given the savage and uncivilised nature
of the Internet. But it also takes place in the otherwise much better managed area of industry. The lack of
order is evident in the data universes of almost all companies and justifies countless initiatives: data
integration projects, business intelligence (BI) projects, data warehouse projects or, lately, Big Data

Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.
Copyright © 2018. Springer. All rights reserved. Ebook pages 16-21 | Printed page 3 of 4

projects. The enormous increase in the significance of data is also reflected in the frequent appointments
of Chief Information Officers (CIOs) whose main task seems to be to bring order into the company’s
data world.
While institutions’ repeated attempts to make their own (!) data landscape manageable are usually
only met with moderate success, the phenomenon of a lack of overall data organisation is even more
pronounced across an entire industry branch or country. The only exceptions to the rule seem to be
certain areas with specialised commercial interests—there, well-developed data worlds are available,
such as search portals for used cars, hotel rooms, flight connections and apartments. However, the well-
known scout websites and price comparison portals are not driven by artificial intelligence (AI). There
are no AI solutions which scan through the used car advertisements of the Internet by text mining and,
using Big Data technology and their intelligent networks, are able to magically determine the necessary
information. No, these data were painstakingly “put into order”, and thus are identifiable via a thorough
classification (e.g. brand, type, year of registration, postal code of the supplier, mileage) and a complete
set of attributes (e.g. presence of air conditioning, trailer coupling).

2.3 Using Information Technology Not Possible Without Content-


Related Expertise
The way the new high-quality data collections formed from micro data are processed and linked has also
changed significantly: given the sheer volume, diversity and complexity of the data, it is impossible to
determine a priori which questions should be answered with this data material at all. This results in a
much higher volatility of evaluation requirements. Information retrieval can no longer be depicted as
classical, straightforward statistical production of prescribed important indicators; instead, the
implementation of data analysis on demand is required.
For example, granular statistics on securities investments (“security-by-security”) were originally
designed with the objective to allow standardised, periodically recurring evaluations. However, a
growing share of analysis requirements is devoted to issues that literally might have come up overnight;
for example, what does the international distribution of investors in government bonds for a certain
European country look like?
The linking of several data sources allows for the creation of a new style of data collections with a
lot of dimensions (meaning identification features for a data point), which offer a gigantic variety of
evaluation possibilities. But imagine a data cube of more than 30 dimensions: which analyst or scientist
can (and would like to) handle 30 dimensions and formulate ad hoc analyses on that? In practice they

Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.
Copyright © 2018. Springer. All rights reserved. Ebook pages 16-21 | Printed page 4 of 4

will usually manage to turn three, four or five “adjusting screws” on their report (which will feel the
same as juggling the corresponding number of balls), but the limit will soon be reached.
The information systems must, therefore, provide the experienced data experts with suitable data
analysis products instead of dropping the expert in a labyrinth of formally fitting data that is nevertheless
no longer manageable. Otherwise, there is the danger that we operate strictly within the rules but still
end up comparing apples and oranges. This applies even more to the frequently used technique of data
mining. Imagine this highly complex data jungle being searched through by bots, and all possible
permutations of the 30 dimensions being formulated and examined for significant values or correlations
of observed variables. What to make of the outcome of this technical tour de force? In this case, as well
as in the case of the Big Data technique described in Chap. 3, professional expertise is of the utmost
importance for evaluating and interpreting the results.

Reference
Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A,
Muir A (2011) The iPlant collaborative: cyberinfrastructure for plant biology. Front Plant Sci 2:34. https://doi.org
/10.3389/fpls.2011.00034. http://journal.frontiersin.org/article/10.3389/fpls.2011.00034/full
[Crossref]

Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.

You might also like