Data Longevity and Accessibility
Data Longevity and Accessibility
An important field in computer science, technology, and library science is the longevity of data.
Scientific research generates huge amounts of data, especially in genomics and astronomy, but
also in the medical sciences, e.g. in medical imaging. In the past, scientific data has been
published in papers and books, stored in libraries, but more recently practically all data is stored
on hard drives or optical discs. However, in contrast to paper, these storage devices may become
unreadable after a few decades. Scientific publishers and libraries have been struggling with this
problem for a few decades, and there is still no satisfactory solution for the long-term storage of
data over centuries or even for eternity.
Data accessibility. Another problem is that much scientific data is never published or deposited
in data repositories such as databases. In a recent survey, data was requested from 516 studies
that were published between 2 and 22 years earlier, but less than one out of five of these studies
were able or willing to provide the requested data. Overall, the likelihood of retrieving data
dropped by 17% each year after publication.[21] Similarly, a survey of 100 datasets in Dryad
found that more than half lacked the details to reproduce the research results from these studies.
[22]
This shows the dire situation of access to scientific data that is not published or does not have
enough details to be reproduced.
A solution to the problem of reproducibility is the attempt to require FAIR data, that is, data that
is Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be
used in subsequent research and thus advances science and technology.[23]
In other fields
Although data is also increasingly used in other fields, it has been suggested that the highly
interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland
introduced the term capta (from the Latin capere, "to take") to distinguish between an immense
number of possible data and a sub-set of them, to which attention is oriented.[24] Johanna Drucker
has argued that since the humanities affirm knowledge production as "situated, partial, and
constitutive," using data may introduce assumptions that are counterproductive, for example that
phenomena are discrete or are observer-independent.[25] The term capta, which emphasizes the
act of observation as constitutive, is offered as an alternative to data for visual representations in
the humanities.
The term data-driven is a neologism applied to an activity which is primarily compelled by data
over all other factors.[citation needed] Data-driven applications include data-driven programming and
data-driven journalism.
See also
Biological data
Computer data processing
Computer memory
Dark data
Data (computer science)
Data acquisition
Data analysis
Data bank
Data cable
Data curation
Data domain
Data element
Data farming
Data governance
Data integrity
Data maintenance
Data management
Data mining
Data modeling
Data point
Data preservation
Data protection
Data publication
Data remanence
Data science
Data set
Data structure
Data visualization
Data warehouse
Database
Datasheet
Data-driven programming
Data-driven journalism
Data-driven testing
Data-driven learning
Data-driven science
Data-driven control system
Data-driven marketing
Digital privacy
Environmental data rescue
Fieldwork
Information engineering
Machine learning
Open data
Scientific data archiving
Secondary Data
Statistics
Digital data
Data aggregation
References
1.