Metadata Definitions - V01
Metadata Definitions - V01
There are some variations on the theme, e.g. claiming that metadata should
(or must) be structured or formalised. Perhaps somewhat unexpectedly the
sources that have a relation to statistics give definitions that are even shorter
and vaguer than some of the general purpose sources. The OECD definition
of statistical metadata is for example simply:
This definition will obviously cover all kinds of documentation with some
reference to any type of statistical data and is applicable to metadata that
refer to data stored in a statistical data warehouse as well as any other type of
data store.
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 2(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
Metadata categories
Metadata may describe many different aspects of data. Hence metadata can
be categorised in a number of ways, or overlapping dimensions.
Consequently, each metadata item normally belongs to several categories.
Passive metadata will become more active if they are used as input for
planning, e.g., a new survey round or a new similar statistics product. The
term active metadata should, however, be reserved for metadata that are
operational. Active metadata may be regarded as an intermediate layer
between the user and the data, which can be used by humans or computer
programmes to search, link, retrieve or perform other operations on data.
Thus active metadata may contain rules or code (algorithmic metadata).
Some authors use the term active only for those metadata, i.e. those that can
be interpreted or executed at runtime to support metadata driven processes,
calling all other non-passive metadata semi-active.
Suggested definitions:
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 3(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
Strictly structured metadata are obviously well suited for use in an active
role, but there is no simple, unambiguous mapping between active and
structured, and passive and free-form, respectively.
Suggested definitions:
In the “statistical sources” the terms business and technical metadata are
rarely used. Several different synonyms can be found for business metadata,
e.g. conceptual or logical. Most commonly used is, however, reference
metadata. Instead of technical metadata you will often find the term
structural metadata
In the “statistical sources” the terms reference metadata and structural
metadata are preferred instead of business and technical metadata. The
definitions remain.
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 4(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
Suggested definitions:
Process metadata
Information on an operation, such as start and end times, result status code,
number of records processed, resources used, etc., is a specific type of
metainformation. This kind of metadata is known under several names, such
as process metadata, process data, process metrics, paradata. These data may
either contain expected values or actual outcome. In both cases they are
primarily intended for planning – in the latter case by evaluating finished
processes in order to improve recurring or similar ones. Process metadata
should be structured to facilitate computer aided evaluation.
Suggested definition:
Keeping track of, maintaining and perhaps raising the quality of the data in
the warehouse is an important governance task that requires support from
metadata. Quality information should be available in different forms and
serve several purposes: to describe the quality achieved (e.g. how a survey
was carried out, or what the outcome was), or to measure the outcome (a
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 5(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
Suggested definition:
Metadata structures
Several sources claim that the data warehouse needs a central system where
its metadata are registered and logically stored, a metadata registry. This
registry will make it easier to handle identification, checks for duplicates,
ensure consistency, etc. It is, however, a logical matter; a centralised
metadata registry does not imply that metadata are physically stored in a
centralised system.
1. Collection
Metadata should be captured as early as possible in the production
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 6(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
2. Maintenance
Metadata must be up to date at all times. Processes must be in place
to capture changes, synchronize metadata with the changing
architecture
3. Deployment
Metadata must be available to users in the right form and with the
right tools.
Metadata standards
Standards for metadata have been discussed for many years, but still have not
developed very far. The most successful effort is probably ISO/IEC 11179,
Metadata registries, which is a standard on the conceptual level. Several
NSIs have based their metadata systems on that standard.
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 7(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
1. the contents of the data warehouse, their location and their structure
2. the processes that take place in the data warehouse
3. the implicit semantics of data along with any other kind of data that
aids the end-user exploit the information of the warehouse
4. the infrastructure and physical characteristics of components and the
sources of the data warehouse
5. security, authentication, and usage statistics that aids the
administrator tune the operation of the data warehouse as appropriate
The metadata categories described earlier in this paper are general. Some
sources mention metadata categories specific to the data warehouse
environment, e.g. ETL metadata (for the “Extract–Transform–Load”
process), but these all seem to be subsets or just renaming the categories
already defined.
552060849.docx
21-10-25 07.42
Draft
ESSnet on Data Warehousing Memo 8(8)
Statistics Sweden 2011-05-1306-10
Lars-Göran Lundell
This does not mean that the remaining metadata categories should be
disregarded, but that they are used and needed in a statistical data warehouse
in the same way as in any statistics production environment.
552060849.docx
21-10-25 07.42
Annex 1
1(4)
Sources
Wikipedia Direct quotations from Wikipedia and from its sources
http://en.wikipedia.org/wiki/Metadata
http://en.wikipedia.org/wiki/Data_warehouse
ISO (International Standards Organization, ISO/IEC 11179 Metadata registries (MDR)), http://metadata-stds.org/11179/
NISO (National Information Standards Organization), Understanding Metadata. http://www.niso.org/publications/press/UnderstandingMetadata.pdf.
UNECE Metadata Common Vocabulary, MCV (Draft, March 2006) http://circa.europa.eu/Public/irc/dsis/metadata/library?
l=/metadata_forces/force_meeting_092007/mtf-6-mcv-anxpdf/_EN_1.0_&a=d
UNECE, Terminology on Statistical Metadata (2000) http://www.unece.org/stats/puSblications/53metadaterminology.pdf
UNECE, Guidelines for the modeling of statistical data and metadata (1995) http://www.unece.org/stats/publications/metadatamodeling.pdf
OECD, Glossary of Statistical Terms http://stats.oecd.org/glossary/