0% found this document useful (0 votes)

64 views8 pages

Organizing Research Data

How to Organize your Data

Uploaded by

John Lexter Rosales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views8 pages

Organizing Research Data

How to Organize your Data

Uploaded by

John Lexter Rosales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221699491

Organizing research data

Article in Acta Veterinaria Scandinavica · June 2011

DOI: 10.1186/1751-0147-53-S1-S2 · Source: PubMed

CITATIONS READS

5 248

1 author:

Peter Sestoft
IT University of Copenhagen
115 PUBLICATIONS 2,779 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Moscow ML View project

All content following this page was uploaded by Peter Sestoft on 25 March 2014.

The user has requested enhancement of the downloaded file.

Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2
http://www.actavetscand.com/content/53/S1/S2

PROCEEDINGS Open Access

Organizing research data

Peter Sestoft
From Databases in veterinary medicine: validation, harmonisation and application. The 24th Symposium of
the Nordic Committee for Veterinary Scientific Cooperation (NKVet)
Copenhagen, Denmark. 19-20 April 2010

Abstract
Research relies on ever larger amounts of data from experiments, automated production equipment,
questionnaries, times series such as weather records, and so on. A major task in science is to combine, process
and analyse such data to obtain evidence of patterns and correlations.
Most research data are on digital form, which in principle ensures easy processing and analysis, easy long-term
preservation, and easy reuse in future research, perhaps in entirely unanticipated ways. However, in practice,
obstacles such as incompatible or undocumented data formats, poor data quality and lack of familiarity with
current technology prevent researchers from making full use of available data.
This paper argues that relational databases are excellent tools for veterinary research and animal production;
provides a small example to introduce basic database concepts; and points out some concerns that must be
addressed when organizing data for research purposes.

Database concepts leads to inconsistency (e.g. two different addresses

A database is an organized collection of data. This recorded for the same farm) and to update problems
section presents the most common tool for storing and (e.g. if the street name of a farm is changed).
processing data in modern society: the relational • If one needs to register a farm before it has a cow,
database. or register a cow before it has a milking event, one
must leave some fields blank, which is likely to confuse
Motivating example later processing and analysis.
Assume we want to keep records of multiple farms A better solution is to use a relational database [1];
(with address), each with multiple cows (with cow since 1985 this is the dominant technology for organizing
identifier and birth date), and for each cow multiple milk- and handling large data sets in production, commerce,
ing events (with date, time, amount of milk, and possibly finance, research and so on.
somatic cell count). From such data, one can compute
many different quantities, such as total milk production or Tables in relational databases
total milk production in each postcode or average number In a relational database the example from Figure 1 would
of cows per farm and much more. A simple spreadsheet be broken into three separate tables called Farm, Cow
style solution would use a single table containing all these and Milk, as shown below. The tables would all be stored
data, as shown in Figure 1. in the same database inside a database system. The
However, this is a poor solution for several reasons: database system may simply be Microsoft Access, which
• The address of a farm is repeated for every cow, and is part of the Microsoft Office suite, or it may be the SAS
the birth date of a cow is repeated for every milking statistical analysis system, and hence the database may
event belonging to that cow. Such redundancy typically reside on the researcher’s normal computer. However,
if the database is to be shared with others, it is more
Correspondence: [email protected] sensible to keep it on a separate server.
IT University of Copenhagen, Rued Langgaards Vej 7, DK-2300 Copenhagen In the Farm table in Figure 2, each line describes a
S, Denmark single farm by its unique farm id, address and postcode.
Full list of author information is available at the end of the article

© 2011 Sestoft; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2 Page 2 of 7
http://www.actavetscand.com/content/53/S1/S2

address | postcode | cowId | birth | time | amount | cellCount

-----------------------------------------------------------------------------------------
Lillegade 11 | 4230 | 1216000002 | 2004-06-15 | 2006-04-02 06:17 | 6.8 |
Lillegade 11 | 4230 | 1216000002 | 2004-06-15 | 2006-04-02 17:45 | 5.7 | 210000
Lillegade 11 | 4230 | 1216000002 | 2004-06-15 | 2007-04-03 05:58 | 7.3 | 195000
Lillegade 11 | 4230 | 3417400019 | 2006-04-01 | 2010-03-21 18:21 | 8.1 |
Lillegade 11 | 4230 | 3417400019 | 2006-04-01 | 2010-03-22 06:34 | 9.4 |
Egholmvej 4 | 4230 | | | | |
Risøvej 134 | 4000 | | | | |

i 5 l li ff d ilki
Figure 1 Flat list of farms, cows and milking events

The heading lists the attributes or columns (id, address Queries in relational databases
and postcode) of the table. Each line below it is called a The beneficial splitting of the flat list of farm, cow and
record or row of the table. The unique farm id is a key; a milk data into three separate tables introduces a challenge,
given key must appear at most once in the table. Those though: How does one combine the tables to obtain useful
database keys are the reason everything (people, cows, information, such as the total milk production in each
supermarket goods) has a number in modern society. postcode? In a relational database this is done using
In the Cow table in Figure 3, each record describes a queries, expressed in the language SQL, or Standard
cow: the cow id is the key in the table, the farmId says Query Language. All modern database systems, including
which farm the cow belongs to, and the birth attribute the open source systems MySql and PostgreSql and the
is the cow’s birthdate. A cow’s farmId attribute is commercial systems DB2, Oracle, Microsoft SQL Server
intended to refer to some farm’s id, which is the key in and Microsoft Access, understand some variant of SQL
the Farm table; hence the farmId in the Cow table is and can execute queries involving millions of records in a
called a foreign key. few seconds. Although the complete SQL language is
In the Milk table in Figure 4, each record describes a rather complex, an introduction can be found in any data-
milking event: the cow id together with the date-and- base book, such as [2]. Here we shall just consider some
time (the “when” attribute) together constitute the key examples of SQL queries, from very simple to moderately
of the table, the amount of milk obtained, and possibly complex.
the cell count. The simplest possible query is: To list all cows. Figure 5
Missing observations, such as those in the cellCount shows an SQL query that extracts all columns (denoted
column of the Milk table, are said to be null. We may by the asterisk *) and all rows of the Cow table; the result,
require, and the database system may enforce, that all shown in italics to the right, is a “table” very similar to
values must be non-null, except possibly in the cell- the Cow table itself.
Count column. This requirement would not work in the To see only the cow’s id and its birth date, we may
original flat list in Figure 1, because it would prevent us specify the id and birth columns after SELECT as
from creating a farm record before the farm has a cow, shown in Figure 6; the result is a table that has only two
which is illogical. Furthermore, the splitting of the flat of the Cow table’s columns, but all its rows.
list into separate Farm, Cow and Milk tables means that To see only the cows belonging to farm number
there is no redundancy and hence less risk of inconsis- 12160, we use a WHERE-clause as in Figure 7; the
tency: the address of a farm is stated only once per result is a table that has all of the Cow table’s columns,
farm, and the farm to which a cow belongs is stated but only those of its rows where the cow’s farmId equals
only once per cow. 12160.

id | address | postcode id | farmId | birth

------------------------------- --------------------------------
12160 | Lillegade 11 | 4230 1216000002 | 12160 | 2004-06-15
12169 | Egholmvej 3 | 4230 3417400019 | 12160 | 2006-04-01
13400 | Risøvej 134 | 4000 3417400021 | 12169 | 2007-12-19
Figure 2 The Farm table Figure 3 The Cow table
Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2 Page 3 of 7
http://www.actavetscand.com/content/53/S1/S2

statistical packages R and SAS. Thus large data sets may

cowId | when | amount | cellCount
------------------------------------------------ be stored in a relational database and may be extracted
1216000002 | 02-04-06 06:17 | 6.8 | and preprocessed using SQL, and then visualization,
1216000002 | 02-04-06 17:45 | 5.7 | 210000
1216000002 | 03-04-07 05:58 | 7.3 | 180000
statistical analysis and data mining or pattern discovery
3417400019 | 21-03-10 18:21 | 8.1 | may be performed using tools that researchers are
3417400019 | 22-03-10 06:34 | 9.4 | already familiar with.
Figure 4 The Milk table
Database design and documentation
The result of a database design is a database schema: a
To see just the number of farms (rather than the list list of the database’s tables; and for each table, a list of
of all farms) we use aggregation by the COUNT func- its columns, the type (e.g. number or text) of values in
tion as in Figure 8; the result is still a “table” albeit with each column, information about which column holds
a single column and a single row. the table’s key, which columns are allowed to hold null
Similarly, we may compute the total amount of milk values, and so on.
by aggregation with the SUM function as shown in The database schema is part of the metadata, that is,
Figure 9. data about the data. Other kinds of metadata that are
To list each farm (by address) and its cows, we need often neglected, but that are very important for scientific
both the Farm table and the Cow table, but for each use, are the units of measurements (e.g. liter, kilogram,
farm we are interested only in the cows belonging to gram, percentage by volume, percentage by weight), the
that farm. This is called a join of the two tables and is precision of measurements, time zone information (local
illustrated in Figure 10. The join operation in principle time, universal time, daylight savings time), and the
considers each combination of a Farm (call it f) and a exact interpretation of “codes” such as clinical observa-
Cow (call it c) and then the WHERE-clause says that we tions (see the section on terminology and ontology) or
want only those combinations where the farm’s id (that answer categories of questionnaires. All of this must be
is, f.id) equals the cow’s farmId (that is, c.farmId). This documented and the documentation preserved and kept
may sound cumbersome but can be done very fast in a up-to-date for the data to be of any future value.
database system. A central concept in database design is normal form,
To compute the total amount of milk for each farm which basically stipulates that tables do not have certain
(given by farm id) we again need a join, now between kinds of redundancies. We shall not go into further
the Cow table and the Milk table. We group the com- details here, except to note that the Farm, Cow and
bined records by farm id (using GROUP BY) and use Milk tables shown in Figure 2 through Figure 4 are on
SUM to compute the amount of milk within each the so-called Boyce-Codd normal form. Normalization is
group, as shown in Figure 11. amply covered in any database book, such as [2].
To compute the total amount of milk for each post-
code we use a three-way join between the Farm, Cow Temporal and spatial databases
and Milk table, group the records by postcode, and Our farm-cow-milk database example is highly simpli-
compute the sum within each group by aggregation, as fied. In particular, it assumes that a cow belongs forever
shown in Figure 12. to the same farm, whereas in reality it may be sold from
The above small examples give a taste of some com- one farm to another. To solve this problem the Cow
mon SELECT queries. Hopefully it transpires that SQL table could be made temporal, by adding a validFrom
is a very powerful language once one understands how and a validTo column. Then each record describes the
to combine the operations into larger queries. Moreover, period in which a given cow belongs to a given farm,
relational databases and SQL can be used from inside which allows for much more detailed queries, such as
standard desktop tools such as Excel spreadsheets or the what is the number of cows for each farm on 30 July

id | farmId | birth
SELECT * --------------------------------
FROM Cow 1216000002 | 12160 | 2004-06-15
3417400019 | 12160 | 2006-04-01
3417400021 | 12169 | 2007-12-19

Figure 5 Query to get all columns and rows of the Cow table
Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2 Page 4 of 7
http://www.actavetscand.com/content/53/S1/S2

SELECT id, birth id | birth

FROM Cow ----------------------
1216000002 | 2004-06-15
3417400019 | 2006-04-01
3417400021 | 2007-12-19

Figure 6 Query to get some columns and all rows of the Cow table

2010, or what is the total milk production per postcode whether or not this leads to problems depends on the
in each of the months of 2010. Unfortunately, the SQL discipline and consistency with which veterinarians
queries become a good deal more complex. The theory register clinical observations. Finally, some codes corre-
of temporal databases is well-developed; a good intro- spond to subcategories or specializations of others; for
duction is provided by [3]. instance 11 udder infection and 38 joint infection are
Moreover, much data is spatial: a farm or field is both special cases of 42 infection; should one then
located at a particular place, which may be described by always use the most specific code available (e.g. 11 or
UTM coordinates or longitude and latitude. Knowing 38) or alternatively always register a more general code
where objects are when allows for queries such as (e.g. 42) along with more specific ones (e.g. 11 or 38)?
at what times was this cow near Gelsted or find all In the former case, will somebody who queries the
pairs of cows that were within 8 km of each other at Clinical table in Figure 13 for all cases of infection
some time as well as epidemiological analyses and easy remember to also query for the more specific ones (e.g.
visualization. 11 and 38)? This example illustrates some problems
with designing category codes for use in databases, and
Terminology and ontology in classifying observations in general.
Here we shall consider a problem that is often over- A suitable system of “codes”, including a consideration
looked in database books: the design of categories or about how “codes” relate to each other, is often called a
“codes”. Assume that we want to extend our farm-cow- terminology, a controlled vocabulary, or an ontology.
milk database with veterinarians’ observations of various An ontology reflects the domain that it describes, such
diseases of cows. For this purpose we might introduce as the domain of animal disease symptoms discussed
two more tables. Table Clinical in Figure 13 contains above. One must first decide what parts of reality to
clinical observations about a given cow, made by a given model (for instance, this cow has an infection), what
veterinarian at a given time, recording a clinical observa- parts of reality to ignore (such as, where is the infection
tion such as joint infection by a code, here 38. located). Similarly, in a database of clinical observations
Another table, called ClinicalTerm and shown in one must make clear whether one records symptoms
Figure 14, associates a description with each clinical code. (e.g. diarrhea) or diagnosis (e.g. enteritis) or cause (Salmo-
However, there are some potential problems with the nella) or all of these. One must also decide how to relate
clinical term codes in Figure 14. First of all, codes 81 the various parts of reality to each other. For instance,
and 140 appear to have the same meaning, so there is a pneumonia is a special case of infection. Moreover, it
risk that two people may use different codes for the affects the lungs, which is part of the anatomy. A good
same observation, which may later produce misleading domain model should be able to express both forms of
results (e.g. statistics) when queries are made to the hierarchical relationship.
database. Second, no distinction is made between find- It takes domain experts, technological understanding,
ings (e.g. 88 will not drink), diagnoses (e.g. 11 udder and good taste to arrive at adequate domain models
infection) and procedures (e.g. 80 hoof trimming); that are not too complex.

SELECT * id | farmId | birth

FROM Cow --------------------------------
WHERE farmId = 12160 1216000002 | 12160 | 2004-06-15
3417400019 | 12160 | 2006-04-01

Figure 7 Query to get all columns and some rows of the Cow table
Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2 Page 5 of 7
http://www.actavetscand.com/content/53/S1/S2

SELECT COUNT(*) COUNT

FROM Farm -----
3

Figure 8 Query to count number of rows in the Farm table

An example of a well-designed (but complex) domain Data stewardship, standards, and sharing
model is SNOMED/CT, which stands for Systematized Sometimes a whole discipline manages to agree on an
Nomenclature of Medical-Clinical Terms. This is a set ontology, as in the case of SNOMED/CT. Such standar-
of standard terms for use in hospitals, electronic patient dization requires considerable effort, but also offers
records, and so on [4]. There are three components of huge synergistic benefits, especially when databases are
SNOMED/CT: made available to all interested parties in a standard for-
• Concepts, used to describe disorders (e.g. 128139000 mat. For instance, within bioinformatics this has led to
Inflammatory disorder and 233604007 Pneumonia), tremendous advances in research on animals, microor-
procedures (e.g. 11466000 Cesarean section), findings ganisms, plants and medicine. Important steps were the
(e.g. 62315008 Diarrhea and 55184003 Infectious enteri- 1980es development of standard formats [7] that enable
tis), causative organisms (e.g. 110378009 Salmonella free interchange of DNA sequence data between US,
enterica), anatomy, and more. Japanese and European institutions, and the requirement
• Descriptions, used primarily for synonyms, e.g. that any sequence data used as basis for a scientific pub-
497137013 Infective enteritis (synonym for concept lication must be published, free of any restrictions on
55184003 Infectious enteritis). further research, in the joint international databases [8].
• Relationships, used to describe how concepts relate While the development of standard formats and ontolo-
to each other, e.g. Pneumonia IS_A Inflammatory gies is important and enables much better utilization of
disorder and Pneumonia FINDING SITE Lung research investments, it looks more like infrastructure
structure. development than research, which means that it appears
Note how each concept and each description has a less exciting and that it may be difficult to obtain funding
unique numeric key. Also note how relationships can be for it. As a consequence, it may be more tempting to pro-
used to relate one concept (pneumonia) both to a disease pose new organizations, web sites and portals than to lay
category and to anatomy, that is, to place the concept in the foundation for them, which caused a Nature editorial
different hierarchies. to admonish that “Initiatives for digital research infrastruc-
SNOMED/CT is maintained by an international ture should focus more on making standardized data
organization whose member countries include the openly available, and less on developing new portals“ [9].
United States, United Kingdom, Germany, The Nether- Thanks to lab automation, sensor development and
lands, Spain, Sweden, Denmark, and many more. In computerized instruments, research produces new data
Denmark and most other places, electronic patient records on a scale never seen before. Yet in many cases the
are still based on older and less powerful classification required efforts to document, check and preserve all
systems, but SNOMED/CT is expected to replace those in these data lag behind researchers’ ability to generate the
the future [5]. data in the first place [10].
Full SNOMED/CT is very complex, with 311,000 This problem is the subject of a report from the US
concepts, 800,000 descriptions and 1,360,000 relations National Academies [11] on integrity, accessibility and
as of April 2010. A smaller subset for veterinary use is stewardship of digital data, encouraged and sponsored
being maintained by Virginia Terminology Services [6]. in part by leading journals [12,13]. The report’s three

SELECT SUM(amount) SUM

FROM Milk -----
37.3

Figure 9 Query to compute total amount of milk over all farms

Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2 Page 6 of 7
http://www.actavetscand.com/content/53/S1/S2

SELECT address, postcode, c.id address | pcode | id

FROM Farm f, Cow c ---------------------------------
WHERE f.id = c.farmId Lillegade 11 | 4230 | 1216000002
Lillegade 11 | 4230 | 3417400019
Egholmvej 3 | 4230 | 3417400021

Figure 10 Query to list farms with associated cows

SELECT c.farmId, SUM(m.amount) AS milk

farmId | milk
FROM Cow c, Milk m
--------------
WHERE c.id = m.cowId
12160 | 37.3
GROUP BY c.farmId

Figure 11 Query to compute total amount of milk for each farm

SELECT f.postcode, SUM(m.amount) AS milk postcode | milk

FROM Farm f, Cow c, Milk m ---------------
WHERE f.id=c.farmId AND c.id=m.cowId 4230 | 37.3
GROUP BY f.postcode

Figure 12 Query to compute total amount of milk for each postcode

main concerns are integrity of data (preventing acciden-

tal or willful tampering), sharing of data (to allow others
id | cowId | vetId | when | code to check accuracy, verify analyses and build on previous
------------------------------------------------- work), and stewardship (long-term preservation) of data.
1 | 1216000002 | 6512 | 2007-03-12 08:02 | 88
2 | 1216000002 | 6809 | 2007-03-23 09:12 | 38 Some of the problems have simple technological solu-
3 | 3417400019 | 6512 | 2008-05-18 13:30 | 42 tions; for instance, fingerprinting with cryptographic
checksums promotes integrity by proving that data has
Figure 13 The Clinical table not been tampered with. For the most part however,
solutions are organizational and come down to policies
and proper documentation. Neither sharing nor long-
term preservation is very useful if there is confusion
about the meaning of code 114, or if some recordings in
code | meaning
the same column are in kilograms, others in liters.
-----------------------------
11 | udder infection To further give a flavour of the report we quote a few
38 | joint infection of the recommendations:
42 | infection • Recommendation 1: Researchers should design and
80 | hoof trimming manage their projects so as to ensure the integrity
81 | hoof trimming ++ of research data, adhering to the professional standards
88 | will not drink
[...]
140 | hoof trimming ++
• Recommendation 6: In research fields that currently
Figure 14 The ClinicalTerm table lack standards for sharing research data, such standards
should be developed [...]
Sestoft Acta Veterinaria Scandinavica 2011, 53(Suppl 1):S2 Page 7 of 7
http://www.actavetscand.com/content/53/S1/S2

• Recommendation 9: Researchers should establish

data management plans at the beginning of each
research project that include appropriate provisions for
the stewardship of research data.
In short, modeling the domain of one’s research and
designing a database is only the beginning. Researchers
must also consider how to preserve and eventually share
raw data to enable replication of experiments and statis-
tical analyses as well as future research that may use the
data in unanticipated ways.

Acknowledgements
This article has been published as part of Acta Veterinaria Scandinavica
Volume 53 Supplement 1, 2011: Databases in veterinary medicine: validation,
harmonisation and application. Proceedings of the 24th Symposium of the
Nordic Committee for Veterinary Scientific Cooperation (NKVet). The full
contents of the supplement are available online at http://www.actavetscand.
com/supplements/53/S1.

Competing interests
The authors declare that they have no competing interests.

Published: 20 June 2011

References
1. Codd EF: A Relational Model of Data for Large Shared Data Banks.
Communications of the ACM 8211, 13(6):377-387, doi:10.1145/
362384.362685.
2. Churcher C: Beginning Database Design. Apress 2007.
3. Snodgrass R: Developing Time-Oriented Database Applications in SQL.
Morgan Kaufmann; 1999, Full text at http://www.cs.arizona.edu/people/rts/
tdbbook.pdf.
4. SNOMED Clinical Terms User Guide. International Health Terminology
Standards Development Organisation; 2009, At http://www.ihtsdo.org/
snomed-ct/.
5. Lippert S: IT University of Copenhagen, personal communication. 2010.
6. Wilcke R: Veterinary adaptation of SNOMED-CT. Presentation at Talbot
Symposium, AVMA Convention 2009, At http://snomed.vetmed.vt.edu/.
7. International Nucleotide Sequence Database Collaboration: The DDBJ/
EMBL/GenBank Feature Table, version 8.3. 2010, At http://www.insdc.org/.
8. Brunak S, Danchin A, Hattori M, Nakamura H, Shinozaki K, Matise T,
Preusset D: Nucleotide Sequence Database Policies. Science 2002,
298:1333.
9. Data for the masses. Editorial. Nature 2009, 457:129-129, doi:10.1038/
457129a.
10. Gray J, Liu DT, Nieto-Santisteban M, Szalay AS: Scientific Data Management
in the Coming Decade. Microsoft Research Technical Report MSR-TR-2005-10
2005, At http://arxiv.org/pdf/cs/0502008.
11. National Academy of Sciences: Ensuring the integrity, accessibility, and
stewardship of research data in the digital age. National Academies Press
2009.
12. Information overload. Editorial. Nature 2009, 460:551-551, doi:10.1038/
460551a.
13. Kleppner D, Sharp PA: Research Data in the Digital Age. Editorial. Science Submit your next manuscript to BioMed Central
2009, 325:368-368, doi:10.1126/science.1178927.
and take full advantage of:
doi:10.1186/1751-0147-53-S1-S2
Cite this article as: Sestoft: Organizing research data. Acta Veterinaria • Convenient online submission
Scandinavica 2011 53(Suppl 1):S2.
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution

Submit your manuscript at

www.biomedcentral.com/submit

View publication stats

Python For Data Science 2025 Slides
No ratings yet
Python For Data Science 2025 Slides
364 pages
Mobile Application Development Notes BCA VI SEM
100% (2)
Mobile Application Development Notes BCA VI SEM
25 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Course Material
100% (1)
Course Material
57 pages
Converted 4011171
No ratings yet
Converted 4011171
144 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Big Data Introduction Unit 1
No ratings yet
Big Data Introduction Unit 1
19 pages
Unit I - Big Data Programming
No ratings yet
Unit I - Big Data Programming
19 pages
Unit I EBDP 2022
No ratings yet
Unit I EBDP 2022
80 pages
Big Data: Management Information Systems
No ratings yet
Big Data: Management Information Systems
11 pages
Lab 1 Introduction To SQL Tools (Xampp, MySQL, PHPMyAdmin) and Data Defination Language (CREATE)
No ratings yet
Lab 1 Introduction To SQL Tools (Xampp, MySQL, PHPMyAdmin) and Data Defination Language (CREATE)
32 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
221 pages
Characteristics of Databses - Lecture 3-4
No ratings yet
Characteristics of Databses - Lecture 3-4
9 pages
BDA Introduction
No ratings yet
BDA Introduction
61 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
74 pages
Big Data Analytics
No ratings yet
Big Data Analytics
30 pages
History of Databases
100% (1)
History of Databases
9 pages
Big Data Unit-1 Kcs-061
No ratings yet
Big Data Unit-1 Kcs-061
64 pages
Dmsmicroprojectm
No ratings yet
Dmsmicroprojectm
28 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
DBMS Using MS-Access
No ratings yet
DBMS Using MS-Access
26 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
Rethinkdb
No ratings yet
Rethinkdb
48 pages
SQL or Nosql For Future Projects: Submitted by
No ratings yet
SQL or Nosql For Future Projects: Submitted by
17 pages
Module 1
No ratings yet
Module 1
60 pages
Existential Therapy - Handout
100% (2)
Existential Therapy - Handout
4 pages
BAD601 Module 1 PDF
No ratings yet
BAD601 Module 1 PDF
64 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
SQL Interview
100% (1)
SQL Interview
68 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
5 pages
Activity Sheet For CPAR Week 4
No ratings yet
Activity Sheet For CPAR Week 4
4 pages
Unit 4
No ratings yet
Unit 4
29 pages
CYT180Week2 - Big Data Models
No ratings yet
CYT180Week2 - Big Data Models
34 pages
3 Pritee 2018
No ratings yet
3 Pritee 2018
11 pages
Organizing Research Data: Acta Veterinaria Scandinavica June 2011
No ratings yet
Organizing Research Data: Acta Veterinaria Scandinavica June 2011
8 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
File Organization Terms and Concepts
100% (1)
File Organization Terms and Concepts
3 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
JCR-Dr. G. Sanjay Gandhi1 56 - AL PDF
No ratings yet
JCR-Dr. G. Sanjay Gandhi1 56 - AL PDF
13 pages
Midterm Notes
No ratings yet
Midterm Notes
10 pages
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Unit I - BDA
No ratings yet
Unit I - BDA
12 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
Chapter 4 Data Analytics
No ratings yet
Chapter 4 Data Analytics
19 pages
DBMS Question DBMS
100% (1)
DBMS Question DBMS
14 pages
TERM2 - SSM - XI IP - Final
No ratings yet
TERM2 - SSM - XI IP - Final
76 pages
DATABASE SYSTEM Lac1
No ratings yet
DATABASE SYSTEM Lac1
23 pages
AMR Assignment
No ratings yet
AMR Assignment
11 pages
Unit 1
No ratings yet
Unit 1
20 pages
Unit 1 BIGDATA - 702 (D) CSE
No ratings yet
Unit 1 BIGDATA - 702 (D) CSE
20 pages
Big Data Pgdca
No ratings yet
Big Data Pgdca
23 pages
Bda Module 1 Notes
No ratings yet
Bda Module 1 Notes
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
10 pages
1 - Data and Organizations
No ratings yet
1 - Data and Organizations
5 pages
What Is Data
No ratings yet
What Is Data
20 pages
Big Data Analytics in Cyber Security IJERTCONV5IS10032
No ratings yet
Big Data Analytics in Cyber Security IJERTCONV5IS10032
3 pages
Unit-I Bdaur-Bcom
No ratings yet
Unit-I Bdaur-Bcom
5 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
7 pages
Health Predict
No ratings yet
Health Predict
143 pages
History of Databases
No ratings yet
History of Databases
9 pages
CSC 203
No ratings yet
CSC 203
13 pages
Big Data
No ratings yet
Big Data
9 pages
IEEE Conf Paper Formatvv
No ratings yet
IEEE Conf Paper Formatvv
5 pages
Get File
No ratings yet
Get File
1 page
1546368382bca Honours Full Syllabus
No ratings yet
1546368382bca Honours Full Syllabus
26 pages
Understanding Big Data
No ratings yet
Understanding Big Data
14 pages
MMS Data Model Report
100% (1)
MMS Data Model Report
689 pages
B ScComputerScience1644403285
No ratings yet
B ScComputerScience1644403285
31 pages
YT - 53 SQL Questions-Answers
No ratings yet
YT - 53 SQL Questions-Answers
89 pages
Intelligence Testing: Aachal P. Taywade
100% (1)
Intelligence Testing: Aachal P. Taywade
30 pages
Mahbubur Rahman HICT@CSA
No ratings yet
Mahbubur Rahman HICT@CSA
408 pages
4.what Is Normalization PDF
No ratings yet
4.what Is Normalization PDF
9 pages
DBMS Lab 2025
No ratings yet
DBMS Lab 2025
7 pages
Nhibernate Reference
100% (2)
Nhibernate Reference
178 pages
(Ebooks PDF) Download SQL Antipatterns Avoiding The Pitfalls of Database Programming 1st Edition Bill Karwin Full Chapters
No ratings yet
(Ebooks PDF) Download SQL Antipatterns Avoiding The Pitfalls of Database Programming 1st Edition Bill Karwin Full Chapters
40 pages
Unit 4
No ratings yet
Unit 4
16 pages
Computer Application To Business-II
No ratings yet
Computer Application To Business-II
3 pages
Immersion Attendance Sheet
100% (1)
Immersion Attendance Sheet
2 pages
Module 8: Emotional Intelligence
No ratings yet
Module 8: Emotional Intelligence
19 pages
DBMS Notes Class 10
No ratings yet
DBMS Notes Class 10
12 pages
5th Sem Syllabus Iem 2023
No ratings yet
5th Sem Syllabus Iem 2023
17 pages
Advanced Database Systems: Module Title
No ratings yet
Advanced Database Systems: Module Title
17 pages
Chapter 4 LAU
No ratings yet
Chapter 4 LAU
73 pages
Hands-On Lab: Create Tables and Load Data in Postgresql Using Pgadmin
No ratings yet
Hands-On Lab: Create Tables and Load Data in Postgresql Using Pgadmin
25 pages
6C - Schneider - Expressive Arts With New Moms1
No ratings yet
6C - Schneider - Expressive Arts With New Moms1
23 pages
Revising The Paper, Selecting The Samples, Planning The Procedure
No ratings yet
Revising The Paper, Selecting The Samples, Planning The Procedure
26 pages
Results and Discussion: Presented By: Shalini Pandey M.Sc. Previous Deptt. of HECM
No ratings yet
Results and Discussion: Presented By: Shalini Pandey M.Sc. Previous Deptt. of HECM
17 pages
Quizno 1
No ratings yet
Quizno 1
2 pages
ST Solution
No ratings yet
ST Solution
29 pages
How To: Setup Up of Oracle Streams Replication
No ratings yet
How To: Setup Up of Oracle Streams Replication
8 pages
Chapter 3
No ratings yet
Chapter 3
4 pages
Performance
No ratings yet
Performance
13 pages
Creates A New Table, A View of A Table, or Other Object in Database
No ratings yet
Creates A New Table, A View of A Table, or Other Object in Database
6 pages
Session Rundown - 2024-02-03 Evening
No ratings yet
Session Rundown - 2024-02-03 Evening
4 pages
Accommodation EVLNScale
No ratings yet
Accommodation EVLNScale
4 pages
Guided Activity For Issues in Contempo
No ratings yet
Guided Activity For Issues in Contempo
2 pages
I.T (402) Model Test Paper 1
No ratings yet
I.T (402) Model Test Paper 1
4 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
From Everand
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
Supun Kamburugamuve
No ratings yet
Hallo Microsoft Excel: Mastering Data Analytics
From Everand
Hallo Microsoft Excel: Mastering Data Analytics
Agus Kurniawan
No ratings yet

Organizing Research Data

Uploaded by

Organizing Research Data

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Organizing research data

Article in Acta Veterinaria Scandinavica · June 2011

Moscow ML View project

The user has requested enhancement of the downloaded file.

PROCEEDINGS Open Access

Organizing research data

Database concepts leads to inconsistency (e.g. two different addresses

address | postcode | cowId | birth | time | amount | cellCount

id | address | postcode id | farmId | birth

statistical packages R and SAS. Thus large data sets may

SELECT id, birth id | birth

SELECT * id | farmId | birth

SELECT COUNT(*) COUNT

Figure 8 Query to count number of rows in the Farm table

SELECT SUM(amount) SUM

Figure 9 Query to compute total amount of milk over all farms

SELECT address, postcode, c.id address | pcode | id

Figure 10 Query to list farms with associated cows

SELECT c.farmId, SUM(m.amount) AS milk

Figure 11 Query to compute total amount of milk for each farm

SELECT f.postcode, SUM(m.amount) AS milk postcode | milk

Figure 12 Query to compute total amount of milk for each postcode

main concerns are integrity of data (preventing acciden-

• Recommendation 9: Researchers should establish

Published: 20 June 2011

Submit your manuscript at

View publication stats

You might also like