Data - Wikipedia
Data - Wikipedia
Data can be seen as the smallest units of factual information that can be used as a basis for
calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements,
including, but not limited to, statistics. Thematically connected data presented in some relevant
context can be viewed as information. Contextually connected pieces of information can then be
described as data insights or intelligence. The stock of insights and intelligence that accumulate over
time resulting from the synthesis of data into information, can then be described as knowledge. Data
has been described as "the new oil of the digital economy".[4][5] Data, as a general concept, refers to
the fact that some existing information or knowledge is represented or coded in some form suitable
for better usage or processing.
Advances in computing technologies have led to the advent of big data, which usually refers to very
large quantities of data, usually at the petabyte scale. Using traditional data analysis methods and
computing, working with such large (and growing) datasets is difficult, even impossible.
(Theoretically speaking, infinite data would yield infinite information, which would render extracting
insights or intelligence impossible.) In response, the relatively new field of data science uses machine
learning (and other artificial intelligence (AI)) methods that allow for efficient applications of analytic
methods to big data.
When "data" is used more generally as a synonym for "information", it is treated as a mass noun in
singular form. This usage is common in everyday language and in technical and scientific fields such
as software development and computer science. One example of this usage is the term "big data".
When used more specifically to refer to the processing and analysis of sets of data, the term retains its
plural form. This usage is common in the natural sciences, life sciences, social sciences, software
development and computer science, and grew in popularity in the 20th and 21st centuries. Some style
guides do not recognize the different meanings of the term and simply recommend the form that best
suits the target audience of the guide. For example, APA style as of the 7th edition requires "data" to
be treated as a plural form.[7]
Meaning
Data, information, knowledge, and wisdom are closely related concepts, but each has its role
concerning the other, and each term has its meaning. According to a common view, data is collected
and analyzed; data only becomes information suitable for making decisions once it has been analyzed
in some fashion.[8] One can say that the extent to which a set of data is informative to someone
depends on the extent to which it is unexpected by that person. The amount of information contained
in a data stream may be characterized by its Shannon entropy.
Knowledge is the awareness of its environment that some entity possesses, whereas data merely
communicates that knowledge. For example, the entry in a database specifying the height of Mount
Everest is a datum that communicates a precisely-measured value. This measurement may be
included in a book along with other data on Mount Everest to describe the mountain in a manner
useful for those who wish to decide on the best method to climb it. Awareness of the characteristics
represented by this data is knowledge.
Data are often assumed to be the least abstract concept, information the next least, and knowledge the
most abstract.[9] In this view, data becomes information by interpretation; e.g., the height of Mount
Everest is generally considered "data", a book on Mount Everest geological characteristics may be
considered "information", and a climber's guidebook containing practical information on the best way
to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of
meanings that range from everyday usage to technical use. This view, however, has also been argued
to reverse how data emerges from information, and information from knowledge.[10] Generally
speaking, the concept of information is closely related to notions
of constraint, communication, control, data, form, instruction,
knowledge, meaning, mental stimulus, pattern, perception, and
representation. Beynon-Davies uses the concept of a sign to
differentiate between data and information; data is a series of
symbols, while information occurs when the symbols are used to
refer to something.[11][12]
Mechanical computing devices are classified according to how they represent data. An analog
computer represents a datum as a voltage, distance, position, or other physical quantity. A digital
computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most
common digital computers use a binary alphabet, that is, an alphabet of two characters typically
denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed
from the binary alphabet. Some special forms of data are distinguished. A computer program is a
collection of data, that can be interpreted as instructions. Most computer languages make a
distinction between programs and the other data on which programs operate, but in some languages,
notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is
also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for
metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a
description of the contents of books.
Data documents
Whenever data needs to be registered, data exists in the form of a data document. Kinds of data
documents include:
data repository
data study
data set
software
data paper
database
data handbook
data journal
Some of these data documents (data repositories, data studies, data sets, and software) are indexed in
Data Citation Indexes, while data papers are indexed in traditional bibliographic databases, e.g.,
Science Citation Index.
Data collection
Gathering data can be accomplished through a primary source (the researcher is the first person to
obtain the data) or a secondary source (the researcher obtains the data that has already been collected
by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary
and include data triangulation and data percolation.[14] The latter offers an articulate method of
collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to
maximize the research's objectivity and permit an understanding of the phenomena under
investigation as complete as possible: qualitative and quantitative methods, literature reviews
(including scholarly articles), interviews with experts, and computer simulation. The data is thereafter
"percolated" using a series of pre-determined steps so as to extract the most relevant information.
Data accessibility. Another problem is that much scientific data is never published or deposited in
data repositories such as databases. In a recent survey, data was requested from 516 studies that were
published between 2 and 22 years earlier, but less than one out of five of these studies were able or
willing to provide the requested data. Overall, the likelihood of retrieving data dropped by 17% each
year after publication.[15] Similarly, a survey of 100 datasets in Dryad found that more than half
lacked the details to reproduce the research results from these studies.[16] This shows the dire
situation of access to scientific data that is not published or does not have enough details to be
reproduced.
A solution to the problem of reproducibility is the attempt to require FAIR data, that is, data that is
Findable, Accessible, Interoperable, and Reusable. Data that fulfills these requirements can be used in
subsequent research and thus advances science and technology.[17]
In other fields
Although data is also increasingly used in other fields, it has been suggested that the highly
interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland
introduced the term capta (from the Latin capere, "to take") to distinguish between an immense
number of possible data and a sub-set of them, to which attention is oriented.[18] Johanna Drucker
has argued that since the humanities affirm knowledge production as "situated, partial, and
constitutive," using data may introduce assumptions that are counterproductive, for example that
phenomena are discrete or are observer-independent.[19] The term capta, which emphasizes the act of
observation as constitutive, is offered as an alternative to data for visual representations in the
humanities.
The term data-driven is a neologism applied to an activity which is primarily compelled by data over
all other factors. Data-driven applications include data-driven programming and data-driven
journalism.
See also
Biological data Data protection
Computer data processing Data publication
Computer memory Data remanence
Dark data Data science
Data (computer science) Data set
Data acquisition Data structure
Data analysis Data visualization
Data bank Data warehouse
Data cable Database
Data curation Datasheet
Data domain Data-driven programming
Data element Data-driven journalism
Data farming Data-driven testing
Data governance Data-driven learning
Data integrity Data-driven science
Data maintenance Data-driven control system
Data management Data-driven marketing
Data mining Digital privacy
Data modeling Environmental data rescue
Data point Fieldwork
Data preservation Information engineering
Machine learning Statistics
Open data Digital data
Scientific data archiving Data aggregation
Secondary Data
References
1. OECD Glossary of Statistical Terms. OECD. 2008. p. 119. ISBN 978-92-64-025561.
2. "Statistical Language - What are Data?" (https://abs.gov.au/websitedbs/a3121120.nsf/home/statist
ical+language+-+what+are+data). Australian Bureau of Statistics. 2013-07-13. Archived (https://w
eb.archive.org/web/20190419010315/http://abs.gov.au/websitedbs/a3121120.nsf/home/statistical+
language+-+what+are+data) from the original on 2019-04-19. Retrieved 2020-03-09.
3. "Data vs Information - Difference and Comparison | Diffen" (https://diffen.com/difference/Data_vs_
Information). www.diffen.com. Retrieved 2018-12-11.
4. Yonego, Joris Toonders (July 23, 2014). "Data Is the New Oil of the Digital Economy" (https://wire
d.com/insights/2014/07/data-new-oil-digital-economy/). Wired – via www.wired.com.
5. "Data is the new oil" (https://web.archive.org/web/20180716224058/https://spotlessdata.com/blog/
data-new-oil). July 16, 2018. Archived from the original (https://spotlessdata.com/blog/data-new-oi
l) on 2018-07-16.
6. "data | Origin and meaning of data by Online Etymology Dictionary" (https://etymonline.com/word/
data). www.etymonline.com.
7. American Psychological Association (2020). "6.11". Publication Manual of the American
Psychological Association: the official guide to APA style. American Psychological Association.
ISBN 9781433832161.
8. "Joint Publication 2-0, Joint Intelligence" (https://web.archive.org/web/20180718055308/http://ww
w.jcs.mil/Portals/36/Documents/Doctrine/pubs/jp2_0.pdf) (PDF). Joint Chiefs of Staff, Joint
Doctrine Publications. Department of Defense. 23 October 2013. pp. I-1. Archived from the
original (https://jcs.mil/Portals/36/Documents/Doctrine/pubs/jp2_0.pdf) (PDF) on 18 July 2018.
Retrieved July 17, 2018.
9. Akash Mitra (2011). "Classifying data for successful modeling" (https://web.archive.org/web/20171
107030817/https://dwbi.org/data-modelling/dimensional-model/16-classifying-data-for-successful-
modeling). Archived from the original (https://dwbi.org/data-modelling/dimensional-model/16-classi
fying-data-for-successful-modeling) on 2017-11-07. Retrieved 2017-11-05.
10. Tuomi, Ilkka (2000). "Data is more than knowledge". Journal of Management Information
Systems. 6 (3): 103–117. doi:10.1080/07421222.1999.11518258 (https://doi.org/10.1080%2F0742
1222.1999.11518258).
11. P. Beynon-Davies (2002). Information Systems: An introduction to informatics in organisations.
Basingstoke, UK: Palgrave Macmillan. ISBN 0-333-96390-3.
12. P. Beynon-Davies (2009). Business information systems. Basingstoke, UK: Palgrave. ISBN 978-0-
230-20368-6.
13. Sharon Daniel. The Database: An Aesthetics of Dignity.
14. Mesly, Olivier (2015), Creating Models in Psychological Research, Springer Psychology : 126
pages. ISBN 978-3-319-15752-8
15. Vines, Timothy H.; Albert, Arianne Y. K.; Andrew, Rose L.; Débarre, Florence; Bock, Dan G.;
Franklin, Michelle T.; Gilbert, Kimberly J.; Moore, Jean-Sébastien; Renaut, Sébastien; Rennison,
Diana J. (2014-01-06). "The availability of research data declines rapidly with article age" (https://d
oi.org/10.1016%2Fj.cub.2013.11.014). Current Biology. 24 (1): 94–97. arXiv:1312.5670 (https://arx
iv.org/abs/1312.5670). doi:10.1016/j.cub.2013.11.014 (https://doi.org/10.1016%2Fj.cub.2013.11.0
14). ISSN 1879-0445 (https://search.worldcat.org/issn/1879-0445). PMID 24361065 (https://pubm
ed.ncbi.nlm.nih.gov/24361065). S2CID 7799662 (https://api.semanticscholar.org/CorpusID:77996
62).
16. Roche, Dominique G.; Kruuk, Loeske E. B.; Lanfear, Robert; Binning, Sandra A. (2015). "Public
Data Archiving in Ecology and Evolution: How Well Are We Doing?" (https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC4640582). PLOS Biology. 13 (11): e1002295. doi:10.1371/journal.pbio.1002295
(https://doi.org/10.1371%2Fjournal.pbio.1002295). ISSN 1545-7885 (https://search.worldcat.org/is
sn/1545-7885). PMC 4640582 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4640582).
PMID 26556502 (https://pubmed.ncbi.nlm.nih.gov/26556502).
17. Eisenstein, Michael (April 2022). "In pursuit of data immortality" (https://doi.org/10.1038%2Fd4158
6-022-00929-3). Nature. 604 (7904): 207–208. Bibcode:2022Natur.604..207E (https://ui.adsabs.h
arvard.edu/abs/2022Natur.604..207E). doi:10.1038/d41586-022-00929-3 (https://doi.org/10.103
8%2Fd41586-022-00929-3). ISSN 1476-4687 (https://search.worldcat.org/issn/1476-4687).
PMID 35379989 (https://pubmed.ncbi.nlm.nih.gov/35379989). S2CID 247954952 (https://api.sem
anticscholar.org/CorpusID:247954952).
18. P. Checkland and S. Holwell (1998). Information, Systems, and Information Systems: Making
Sense of the Field. Chichester, West Sussex: John Wiley & Sons. pp. 86–89. ISBN 0-471-95820-
4.
19. Johanna Drucker (2011). "Humanities Approaches to Graphical Display" (https://digitalhumanities.
org/dhq/vol/5/1/000091/000091.html). Digital Humanities Quarterly. 005 (1).
External links
Data is a singular noun (https://purl.org/nxg/note/singular-data) (a detailed assessment)