Measuring_the_Data_Universe_Data_Integration_Using..._----_(2. What_Does_Reality_Look_Like_)
Measuring_the_Data_Universe_Data_Integration_Using..._----_(2. What_Does_Reality_Look_Like_)
Patricia Staab
Abstract
Data are collected fervently but not wisely: often, not what is needed but what arises is gathered.
Therefore, despite a tsunami of data, painful data gaps still exist.
When data sets do not fit together, the potential within them cannot be exploited. Nevertheless, the
information industry has neither a system of order nor any comprehensive standard for data. This
deficiency explains why firms launch countless data warehousing, business intelligence (BI) or Big Data
projects, and why they appoint multiple chief data officers whose primary task is usually to bring overall
order to the companies’ own data.
At the end of the day, the gap will not be closed by the use, however massive, of technology alone.
Data analysis on demand on data sets with dozens, or hundreds, of dimensions is not possible without
consciousness, intelligence and expertise.
Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.
Copyright © 2018. Springer. All rights reserved. Ebook pages 16-21 | Printed page 2 of 4
Another example is the current debate on the possibly carcinogenic effect of the broad-spectrum
herbicide glyphosate used for weed control. During this discussion it emerged that knowledge about the
geographical distribution of diseases is incomplete. Therefore, no investigations can be carried out on
correlations of (frequent) incidents of disease with the location of potential sources of danger, such as
the application areas of hazardous substances in agriculture or on railway tracks, the sites of power
plants or emission-intensive factories and traffic points. A similar question is whether studies on the
prevalence of skin cancers could gain insight if data from medicine and meteorology (solar radiation and
intensity, hours of sunshine by geographic allocation, etc.) were combined.
Finally, the new international institutions for monitoring financial stability created in the aftermath
of various financial crises, such as the G20 countries’ Financial Stability Board, have identified a whole
series of data gaps in the data sets for the financial and real economies. Currently, efforts are being taken
worldwide to fill those gaps—to be better informed is to be better prepared.
Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.
Copyright © 2018. Springer. All rights reserved. Ebook pages 16-21 | Printed page 3 of 4
projects. The enormous increase in the significance of data is also reflected in the frequent appointments
of Chief Information Officers (CIOs) whose main task seems to be to bring order into the company’s
data world.
While institutions’ repeated attempts to make their own (!) data landscape manageable are usually
only met with moderate success, the phenomenon of a lack of overall data organisation is even more
pronounced across an entire industry branch or country. The only exceptions to the rule seem to be
certain areas with specialised commercial interests—there, well-developed data worlds are available,
such as search portals for used cars, hotel rooms, flight connections and apartments. However, the well-
known scout websites and price comparison portals are not driven by artificial intelligence (AI). There
are no AI solutions which scan through the used car advertisements of the Internet by text mining and,
using Big Data technology and their intelligent networks, are able to magically determine the necessary
information. No, these data were painstakingly “put into order”, and thus are identifiable via a thorough
classification (e.g. brand, type, year of registration, postal code of the supplier, mileage) and a complete
set of attributes (e.g. presence of air conditioning, trailer coupling).
Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.
Copyright © 2018. Springer. All rights reserved. Ebook pages 16-21 | Printed page 4 of 4
will usually manage to turn three, four or five “adjusting screws” on their report (which will feel the
same as juggling the corresponding number of balls), but the limit will soon be reached.
The information systems must, therefore, provide the experienced data experts with suitable data
analysis products instead of dropping the expert in a labyrinth of formally fitting data that is nevertheless
no longer manageable. Otherwise, there is the danger that we operate strictly within the rules but still
end up comparing apples and oranges. This applies even more to the frequently used technique of data
mining. Imagine this highly complex data jungle being searched through by bots, and all possible
permutations of the 30 dimensions being formulated and examined for significant values or correlations
of observed variables. What to make of the outcome of this technical tour de force? In this case, as well
as in the case of the Big Data technique described in Chap. 3, professional expertise is of the utmost
importance for evaluating and interpreting the results.
Reference
Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A,
Muir A (2011) The iPlant collaborative: cyberinfrastructure for plant biology. Front Plant Sci 2:34. https://doi.org
/10.3389/fpls.2011.00034. http://journal.frontiersin.org/article/10.3389/fpls.2011.00034/full
[Crossref]
Stahl, Reinhold, and Patricia Staab. Measuring the Data Universe : Data Integration Using Statistical Data and Metadata Exchange, Springer, 2018. ProQuest Ebook Central,
http://ebookcentral.proquest.com/lib/europaeu/detail.action?docID=5396658.
Created from europaeu on 2025-05-13 12:50:57.