Jump to content

Metadata: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
ClueBot (talk | contribs)
m Reverting possible vandalism by 59.96.20.197 to version by 160.39.101.115. False positive? Report it. Thanks, ClueBot. (788717) (Bot)
added section about metadata and the law
Line 293: Line 293:
=== Meta-metadata ===
=== Meta-metadata ===
Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata". What is machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though.
Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata". What is machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though.

== Metadata and the Law ==
===United States===

Problems involving metadata in [[litigation]] in the [[United States]] are becoming widespread. Courts have looked at various questions involving metadata, including the discoverability of metadata by parties. Although the [[Federal Rules of Civil Procedure]] have only specified rules about electronic documents, subsequent case law has elaborated on the requirement of parties to reveal metadata.


== See also ==
== See also ==

Revision as of 01:28, 14 October 2009

Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media. Metadata is text, voice, or image that describes what the audience wants or needs to see or experience. The audience could be a person, group, or software program. Metadata is important because it aids in clarifying and finding the actual data.[1]An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, such as a database schema. In data processing, metadata provides information about, or documentation of, other data managed within an application or environment. This commonly defines the structure or schema of the primary data.

For example, metadata would document data about data elements or attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.). Metadata may include descriptive information about the context, quality and condition, or characteristics of the data. It may be recorded with high or low granularity.

An example of metadata occurs within file systems. Associated with every file on the storage medium is metadata that records the date the file was created, the date it was last modified and the date the file (or indeed the metadata itself) was last accessed.

Purpose

Metadata provides context for data.

Metadata is used to facilitate the understanding, usage, and management of data, both by human and computers. Thus metadata can describe the data conceptually so that others can understand them; it can describe the data syntactically so others can use them; and the two types of descriptions together can facilitate decisions about how to manage the data.

The metadata required to effectively work with data varies with the type of data, their context of use, and their purpose. Often data providers will provide users access to a variety of metadata fields, which can be used individually or in combinations, and applied by different users to achieve different goals. These users can be human 'end users', or other computing systems.

See also the Use section below for more details about the use of metadata.

Hierarchies

When structured into a hierarchical arrangement, metadata is more properly called an ontology or schema. Both terms describe "what exists" for some purpose or to enable some action. For instance, the arrangement of subject headings in a library catalog serves not only as a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects "exist" in the library's own ontology and how more specialized topics are related to or derived from the more general subject headings.

Metadata is frequently stored in a central location and used to help organizations standardize their data. This information is typically stored in a metadata registry.

Examples

These examples list metadata that describe particular digital entities. For clarity and consistency with some definitions of metadata, these examples are expressed with respect to digitized form of each entity, not data that is represented solely in a physical object like a book. (For information resources that are not in digital form, metadata is only the information that describes the information content, not the information about the physical representation.)

In most cases, the examples illustrate the use of metadata to describe the entity's content (conceptually), how the entity came to be (provenance), and information necessary for the system to use it. The last set of information about system-related details is typically hidden from the user, but includes the internal file name, location, and creation/access times for the digital entity.

Because the concept of metadata is specific to each situation—"one person's data is another person's metadata"—examples should be considered illustrative rather than absolute.

Video Recording

The television show or movie recorded on a digital video recorder has extensive metadata. These may include the title, director, actors, summary of the contents, length of the recording, critical rating, and the data and source of this recording. System use metadata includes the file name and current status (viewing status, 'save until' date).

Book

Examples of metadata regarding a book would be the title, author(s), date of publication, subject, a unique identifier (such as International Standard Book Number (ISBN)), number of pages, and the language of the text. Metadata unique to the electronic format includes usage (last opened, current page, times read) and other user-provided data (ranking, tags, annotations). System use metadata might include purchase and digital rights information for the content.

Image

Digital images include both digital photographs, and images that have been created or modified on a computer. Metadata for a digital photograph typically includes the date and time at which it was created and details of the camera settings (such as focal length, aperture, exposure). Many digital cameras record metadata in their digital images, in formats like exchangeable image file format (EXIF) or JPEG. Some cameras can automatically include extended metadata such as the location the picture was taken (e.g., from a GPS). Most image editing software includes at least some metadata in the digital image, and can include content about the image's provenance and licensing.

Audio

Audio recordings may also be labeled with metadata. When audio formats moved from analogue to digital, it became possible to embed this metadata within the digital content itself. (Without any metadata, the digital content is simply a file containing the audio waveform.)

Metadata can be used to name, describe, catalogue and indicate ownership or copyright for a digital audio file, as well as allow user characterizations of the audio content (ratings, tags, and other auxiliary metadata). Its presence simplifies locating a specific audio file within a group, through use of a search engine that accesses the metadata. The typical audio player or audio application on a computer relies heavily on metadata to provide user features.

As different digital audio formats were developed, it was agreed that a standardized and specific location would be set aside within the digital files where this information could be stored. As a result, almost all digital audio formats, including mp3, broadcast wav and AIFF files, have similar standardized locations that can be populated with metadata. This "information about information" has become one of the great advantages of working with digital audio files, since the catalogue and descriptive information that makes up the metadata is built right into the audio file itself, ready for easy access and use.

Web page

The HTML format used to define web pages allows for the inclusion of a variety of types of metadata, from simple descriptive text, dates and keywords to highly-granular information such as the Dublin Core and e-GMS standards. Pages can be geotagged with coordinates. Metadata may be included in the page's header or in a separate file. Microformats allow metadata to be added to on-page data in a way that users don't see, but computers can readily access.

Levels

The hierarchy of metadata descriptions can go on forever, but usually context or semantic understanding makes extensively detailed explanations unnecessary.

The role played by any particular datum depends on the context. For example, when considering the geography of London, "E8 3BJ" would be a datum and "Post Code" would be metadatum. But, when considering the data management of an automated system that manages geographical data, "Post Code" might be a datum and then "data item name" and "6 characters, starting with A–Z" would be metadata.

In any particular context, metadata characterizes the data it describes, not the entity described by that data. So, in relation to "E8 3BJ", the datum "is in London" is a further description of the place in the real world which has the post code "E8 3BJ", not of the code itself. Therefore, although it is providing information connected to "E8 3BJ" (telling us that this is the post code of a place in London), this would not normally be considered metadata, as it is describing "E8 3BJ" as a place in the real world and not as data.

Definitions

Etymology

Meta is a classical Greek preposition (μετ’ αλλων εταιρων) and prefix (μεταβασις) conveying the following senses in English, depending upon the case of the associated noun: among; along with; with; by means of; in the midst of; after; behind.[2] In epistemology, the word means "about (its own category)"; thus metadata is "data about the data".

Varying definitions

The term was introduced intuitively, without a formal definition. Because of that, today there are various definitions. The most common one is the literal translation:

  • "Data about data are referred to as metadata."[3]

Example: "12345" is data, and with no additional context is meaningless. When "12345" is given a meaningful name (metadata) of "ZIP code", one can understand , and further placing "ZIP code" within the context of a postal address) that "12345" refers to the General Electric plant in Schenectady, New York.

As for most people the difference between data and information is merely a philosophical one of no relevance in practical use, other definitions are:

  • Metadata is information about data.
  • Metadata is information about information.
  • Metadata contains information about that data or other data

There are more sophisticated definitions, such as:

  • "Metadata is structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities."[4]
  • "[Metadata is a set of] optional structured descriptions that are publicly available to explicitly assist in locating objects."[5]

These are used more rarely because they tend to concentrate on one purpose of metadata — to find "objects", "entities" or "resources" — and ignore others, such as using metadata to optimize compression algorithms, or to perform additional computations using the data.

The metadata concept has been extended into the world of systems to include any "data about data": the names of tables, columns, programs, and the like. Different views of this "system metadata" are detailed below, but beyond that is the recognition that metadata can describe all aspects of systems: data, activities, people and organizations involved, locations of data and processes, access methods, limitations, timing and events, as well as motivation and rules.

Fundamentally, then, metadata is "the data that describe the structure and workings of an organization's use of information, and which describe the systems it uses to manage that information". To do a model of metadata is to do an "Enterprise model" of the information technology industry itself.[6]

Markup

In the context of the web and the work of the W3C in providing markup technologies of HTML, XML and SGML the concept of metadata has specific context that is perhaps clearer than in other information domains. With markup technologies there is metadata, markup and data content. The metadata describes characteristics about the data, while the markup identifies the specific type of data content and acts as a container for that document instance. This page in Wikipedia is itself an example of such usage, where the textual information is data, how it is packaged, linked, referenced, styled and displayed is markup and aspects and characteristics of that markup are metadata set globally across Wikipedia.

In the context of markup the metadata is architected to allow optimization of document instances to contain only a minimum amount of metadata, while the metadata itself is likely referenced externally such as in a schema definition (XSD) instance. Also it should be noted that markup provides specialised mechanisms that handle referential data, again avoiding confusion over what is metadata or data, and allowing optimizations. The reference and ID mechanisms in markup allowing reference links between related data items, and links to data items that can then be repeated about a data item, such as an address or product details. These are then all themselves simply more data items and markup instances rather than metadata.

Similarly there are concepts such as classifications, ontologies and associations for which markup mechanisms are provided. A data item can then be linked to such categories via markup and hence provide a clean delineation between what is metadata, and actual data instances. Therefore the concepts and descriptions in a classification would be metadata, but the actual classification entry for a data item is simply another data instance.

Some examples can illustrate the points here. Items in bold are data content, in italic are metadata, normal text items are all markup.

The two examples show in-line use of metadata within markup relating to a data instance (XML) compared to simple markup (HTML).

A simple HTML instance example:

<span style="normalText">Example</span>

And then an XML instance example with metadata:

<PersonMiddleName nillable="true">John</PersonMiddleName>

Where the inline assertion that a person's middle name may be an empty data item is metadata about the data item. Such definitions however are usually not placed inline in XML. Instead these definitions are moved away into the schema definition that contains the metadata for the entire document instance. This again illustrates another important aspect of metadata in the context of markup. The metadata is optimally defined only once for a collection of data instances. Hence repeated items of markup are rarely metadata, but rather more markup data instances themselves.

Difference between data and metadata

Usually it is not possible to distinguish between (plain) data and metadata because:

  • Something can be data and metadata at the same time. The headline of an article is both its title (metadata) and part of its text (data).
  • Data and metadata can change their roles. A poem, as such, would be regarded as data, but if there is a song that uses it as lyrics, the whole poem could be attached to an audio file of the song as metadata. Thus, the labeling depends on the point of view.

These considerations apply no matter which of the above definitions is considered, except where explicit markup is used to denote what is data and what is metadata.

Use

Metadata has many different applications; this section lists some of the most common.

Metadata is used to speed up and enrich searching for resources. In general, search queries using metadata can save users from performing more complex filter operations manually. It is now common for web browsers (with the notable exception of Mozilla Firefox), P2P applications and media management software to automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched.[citation needed]

Metadata may also be associated to files manually. This is often the case with documents which are scanned into a document storage repository such as FileNet or Documentum. Once the documents have been converted into an electronic format a user brings the image up in a viewer application, manually reads the document and keys values into an online application to be stored in a metadata repository.

Metadata provide additional information to users of the data it describes. This information may be descriptive ("These pictures were taken by children in the school's third grade class.") or algorithmic ("Checksum=139F").

Metadata helps to bridge the semantic gap. By telling a computer how data items are related and how these relations can be evaluated automatically, it becomes possible to process even more complex filter and search operations. For example, if a search engine understands that "Van Gogh" was a "Dutch painter", it can answer a search query on "Dutch painters" with a link to a web page about Vincent Van Gogh, although the exact words "Dutch painters" never occur on that page. This approach, called knowledge representation, is of special interest to the semantic web and artificial intelligence.

Certain metadata is designed to optimize lossy compression. For example, if a video has metadata that allows a computer to tell foreground from background, the latter can be compressed more aggressively to achieve a higher compression rate.

Some metadata is intended to enable variable content presentation. For example, if a picture has metadata that indicates the most important region — the one where there is a person — an image viewer on a small screen, such as on a mobile phone's, can narrow the picture to that region and thus show the user the most interesting details. A similar kind of metadata is intended to allow blind people to access diagrams and pictures, by converting them for special output devices or reading their description using text-to-speech software.

Other descriptive metadata can be used to automate workflows. For example, if a "smart" software tool knows content and structure of data, it can convert it automatically and pass it to another "smart" tool as input. As a result, users save the many copy-and-paste operations required when analyzing data with "dumb" tools.

Metadata is becoming an increasingly important part of electronic discovery. [1] Application and file system metadata derived from electronic documents and files can be important evidence. Recent changes to the Federal Rules of Civil Procedure make metadata routinely discoverable as part of civil litigation. Parties to litigation are required to maintain and produce metadata as part of discovery, and spoliation of metadata can lead to sanctions.

Metadata has become important on the World Wide Web because of the need to find useful information from the mass of information available. Manually-created metadata adds value because it ensures consistency. If a web page about a certain topic contains a word or phrase, then all web pages about that topic should contain that same word or phrase. Metadata also ensures variety, so that if a topic goes by two names each will be used. For example, an article about "sport utility vehicles" would also be tagged "4 wheel drives", "4WDs" and "four wheel drives", as this is how SUVs are known in some countries.

Examples of metadata for an audio CD include the MusicBrainz project and All Media Guide's Allmusic. Similarly, MP3 files have metadata tags in a format called ID3.

Types

Metadata can be classified by:

  • Content. Metadata can either describe the resource itself (for example, name and size of a file) or the content of the resource (for example, "This video shows a boy playing football").
  • Mutability. With respect to the whole resource, metadata can be either immutable (for example, the "Title" of a video does not change as the video itself is being played) or mutable (the "Scene description" does change).
  • Logical function. There are three layers of logical function: at the bottom the subsymbolic layer that contains the raw data itself, then the symbolic layer with metadata describing the raw data, and on the top the logical layer containing metadata that allows logical reasoning using the symbolic layer

types of metadata are;

  1. descriptive metadata.
  2. administrative metadata.
  3. structural metadata.
  4. technical metadata.
  5. use metadata

To successfully develop and use metadata, several important issues should be treated with care:

Risks

Microsoft Office files include metadata beyond their printable content, such as the original author's name, the creation date of the document, and the amount of time spent editing it. Unintentional disclosure can be awkward or even, in professional practices requiring confidentiality, raise malpractice concerns. Some of Microsoft Office document's metadata can be seen by clicking File then Properties from the program's menu. Other metadata is not visible except through external analysis of a file, such as is done in forensics. The author of the Microsoft Word-based Melissa computer virus in 1999 was caught due to Word metadata that uniquely identified the computer used to create the original infected document.

Lifecycle

Even in the early phases of planning and designing it is necessary to keep track of all metadata created. It is not economical to start attaching metadata only after the production process has been completed. For example, if metadata created by a digital camera at recording time is not stored immediately, it may have to be restored afterwards manually with great effort. Therefore, it is necessary for different groups of resource producers to cooperate using compatible methods and standards.

  • Manipulation. Metadata must adapt if the resource it describes changes. It should be merged when two resources are merged. These operations are seldom performed by today's software; for example, image editing programs usually do not keep track of the Exif metadata created by digital cameras.
  • Destruction. It can be useful to keep metadata even after the resource it describes has been destroyed, for example in change histories within a text document or to archive file deletions due to digital rights management. None of today's metadata standards consider this phase.

Storage

Metadata can be stored either internally, in the same file as the data, or externally, in a separate file. Metadata that is embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data. Both ways have advantages and disadvantages:

  • Internal storage allows transferring metadata together with the data it describes; thus, metadata is always at hand and can be manipulated easily. This method creates high redundancy and does not allow holding metadata together.
  • External storage allows bundling metadata, for example in a database, for more efficient searching. There is no redundancy and metadata can be transferred simultaneously when using streaming. However, as most formats use URIs for that purpose, the method of how the metadata is linked to its data should be treated with care. What if a resource does not have a URI (resources on a local hard disk or web pages that are created on-the-fly using a content management system)? What if metadata can only be evaluated if there is a connection to the Web, especially when using RDF? How to realize that a resource is replaced by another with the same name but different content?

Moreover, there is the question of data format: storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. On the other hand, these formats are not optimized for storage capacity; it may be useful to store metadata in a binary, non-human-readable format instead to speed up transfer and save memory.

Types

In general, there are two distinct classes of metadata: structural or control metadata and guide metadata.[7] Structural metadata is used to describe the structure of computer systems such as tables, columns and indexes. Guide metadata is used to help humans find specific items and is usually expressed as a set of keywords in a natural language.

Metadata can be divided into 3 distinct categories:

  • Administrative
  • Descriptive
  • Structural

Information Technology and Software Engineering metadata

General IT metadata

In contrast, David Marco, another metadata theorist, defines metadata as "all physical data and knowledge from inside and outside an organization, including information about the physical data, technical and business processes, rules and constraints of the data, and structures of the data used by a corporation."[8] Others have included web services, systems and interfaces. In fact, the entire Zachman Framework (see Enterprise Architecture) can be represented as metadata.[9]

Notice that such definitions expand metadata's scope considerably, to encompass most or all of the data required by the Management Information Systems capability. In this sense, the concept of metadata has significant overlaps with the ITIL concept of a Configuration Management Database (CMDB), and also with disciplines such as Enterprise Architecture and IT portfolio management.

This broader definition of metadata has precedent. Third generation corporate repository products (such as those eventually merged into the CA Advantage line) not only store information about data definitions (COBOL copybooks, DBMS schema), but also about the programs accessing those data structures, and the Job Control Language and batch job infrastructure dependencies as well. These products (some of which are still in production) can provide a very complete picture of a mainframe computing environment, supporting exactly the kinds of impact analysis required for ITIL-based processes such as Incident and Change Management. The ITIL Back Catalogue includes the Data Management volume which recognizes the role of these metadata products on the mainframe, posing the CMDB as the distributed computing equivalent. CMDB vendors however have generally not expanded their scope to include data definitions, and metadata solutions are also available in the distributed world. Determining the appropriate role and scope for each is thus a challenge for large IT organizations requiring the services of both.

Since metadata is pervasive, centralized attempts at tracking it need to focus on the most highly leveraged assets. Enterprise Assets may only constitute a small percentage of the entire IT portfolio.

Some practitioners have successfully managed IT metadata using the Dublin Core metamodel.[10]

IT metadata management products

First generation data dictionary/metadata repository tools would be those only supporting a specific DBMS, such as IDMS's IDD (integrated data dictionary), the IMS Data Dictionary, and ADABAS's Predict.

Second generation would be ASG's DATAMANAGER product which could support many different file and DBMS types.

Third generation repository products became briefly popular in the early 1990s along with the rise of widespread use of RDBMS engines such as IBM's DB2.

Fourth generation products link the repository with more Extract, transform, load tools and can be connected with architectural modeling tools.

Fifth generation products are taking things to a new level by integrating distributed computing, specialized hardware, extreme visualization, and analytics, in a sense that now allows vertical uses of metadata in all sorts of things such as applications, messaging buses etc.

Relational database metadata

Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:

  • Tables of all tables in a database, their names, sizes and number of rows in each table.
  • Tables of columns in each database, what tables they are used in, and the type of data stored in each column.

In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata. Programmatic access to metadata is possible using APIs such as JDBC, or SchemaCrawler[11].

Data warehouse metadata

Data warehouse metadata systems are sometimes separated into two sections:

  1. back room metadata that are used for Extract, transform, load functions to get OLTP data into a data warehouse
  2. front room metadata that are used to label screens and create reports

Kimball[12] lists the following types of metadata in a data warehouse (See also [2]):

Michael Bracket defines metadata (what he calls "Data resource data") as "any data about the organization's data resource".[13] Adrienne Tannenbaum defines metadata as "the detailed description of instance data. The format and characteristics of populated instance data: instances and values, dependent on the role of the metadata recipient".[14] These definitions are characteristic of the "data about data" definition.

Business Intelligence metadata

Business Intelligence is the process of analyzing large amounts of corporate data, usually stored in large databases such as a Data Warehouse, tracking business performance, detecting patterns and trends, and helping enterprise business users make better decisions. Business Intelligence metadata describes how data is queried, filtered, analyzed, and displayed in Business Intelligence software tools, such as Reporting tools, OLAP tools, Data Mining tools.

Examples:

  • Data Mining metadata: The descriptions and structures of DataSets, Algorithms, Queries
  • OLAP metadata: The descriptions and structures of Dimensions, Cubes, Measures (Metrics), Hierarchies, Levels, Drill Paths
  • Reporting metadata: The descriptions and structures of Reports, Charts, Queries, DataSets, Filters, Variables, Expressions

Business Intelligence metadata can be used to understand how corporate financial reports reported to Wall Street are calculated, how the revenue, expense and profit are aggregated from individual sales transactions stored in the data warehouse. A good understanding of Business Intelligence metadata is required to solve complex problems such as compliance with corporate governance standards, such as Sarbanes Oxley (SOX) or Basel II.

File system metadata

Nearly all file systems keep metadata about files out-of-band. Some systems keep metadata in directory entries; others in specialized structure like inodes or even in the name of a file. Metadata can range from simple timestamps, mode bits, and other special-purpose information used by the implementation itself, to icons and free-text comments, to arbitrary attribute-value pairs.

With more complex and open-ended metadata, it becomes useful to search for files based on the metadata contents. The Unix find utility was an early example, although inefficient when scanning hundreds of thousands of files on a modern computer system. Apple Computer's Mac OS X operating system supports cataloguing and searching for file metadata through a feature known as Spotlight, as of version 10.4. Microsoft worked in the development of similar functionality with the Instant Search system in Windows Vista, as well as being present in SharePoint Server. Linux implements file metadata using extended file attributes.

Program metadata

Metadata is casually used to describe the controlling data used in software architectures that are more abstract or configurable. Most executable file formats include what may be termed "metadata" that specifies certain, usually configurable, behavioral runtime characteristics. However, it is difficult if not impossible to precisely distinguish program "metadata" from general aspects of stored-program computing architecture; if the machine reads it and acts upon it, it is a computational instruction, and the prefix "meta" has little significance.

In Java, the class file format contains metadata used by the Java compiler and the Java virtual machine to dynamically link classes and to support reflection. The Java Platform, Standard Edition since J2SE 5.0 has included a metadata facility to allow additional annotations that are used by development tools.

In MS-DOS, the COM file format does not include metadata, while the EXE file and Windows PE formats do. These metadata can include the company that published the program, the date the program was created, the version number and more.

In the Microsoft .NET executable format, extra metadata is included to allow reflection at runtime.

Existing software metadata

Object Management Group (OMG) has defined metadata format for representing entire existing applications for the purposes of software mining, software modernization and software assurance. This specification, called the OMG Knowledge Discovery Metamodel (KDM) is the OMG's foundation for "modeling in reverse". KDM is a common language-independent intermediate representation that provides an integrated view of an entire enterprise application, including its behavior (program flow), data, and structure. One of the applications of KDM is Business Rules Mining.

Knowledge Discovery Metamodel includes a fine grained low-level representation (called "micro KDM"), suitable for performing static analysis of programs.

Document metadata

Most programs that create documents, including Microsoft SharePoint, Microsoft Word and other Microsoft Office products, save metadata with the document files. These metadata can contain the name of the person who created the file (obtained from the operating system), the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file. Other saved material, such as deleted text (saved in case of an undelete command), document comments and the like, is also commonly referred to as "metadata", and the inadvertent inclusion of this material in distributed files has sometimes led to undesirable disclosures.

Document Metadata is particularly important in legal environments where litigation can request this sensitive information (metadata) which can include many elements of private detrimental data. This data has been linked to multiple lawsuits that have got corporations into legal complications.

Many legal firms today[who?] use metadata removal tools. These clean documents before they are sent outside of the firm. This process partially protects lawfirms from potentially unsafe leaking of sensitive data through Electronic Discovery. Removal of metadata alone is only one aspect of redaction, a technique for which it's infamously necessary to perform it thoroughly and completely.

For a list of executable formats, see object file.

Digital library metadata

There are three categories of metadata that are frequently used to describe objects in a digital library:[15]

  1. descriptive - Information describing the intellectual content of the object, such as MARC cataloguing records, finding aids or similar schemes. It is typically used for bibliographic purposes and for search and retrieval.
  2. structural - Information that ties each object to others to make up logical units (e.g., information that relates individual images of pages from a book to the others that make up the book).
  3. administrative - Information used to manage the object or control access to it. This may include information on how it was scanned, its storage format, copyright and licensing information, and information necessary for the long-term preservation of the digital objects.

Standards for metadata in digital libraries include Dublin Core, METS, PREMIS schema, and OAI-PMH.

Image metadata

Examples of image files containing metadata include Exchangeable image file format (EXIF) and Tagged Image File Format (TIFF).

Having metadata about images embedded in TIFF or EXIF files is one way of acquiring additional data about an image. Tagging pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections. A prime example of an image tagging service is Flickr, where users upload images and then describe the contents. Other patrons of the site can then search for those tags. Flickr uses a folksonomy: a free-text keyword system in which the community defines the vocabulary through use rather than through a controlled vocabulary.

Users can also tag photos for organization purposes using Adobe's Extensible Metadata Platform (XMP) language, for example.

Digital photography is increasingly making use of technical metadata tags describing the conditions of exposure. Photographers shooting Camera RAW file formats can use applications such as Adobe Bridge or Apple Computer's Aperture to work with camera metadata for post-processing.

Geospatial metadata

Metadata that describe geographic objects (such as datasets, maps, features, or simply documents with a geospatial component) have a history going back to at least 1994 (refer MIT Library page on FGDC Metadata). This class of metadata is described more fully on the Geospatial metadata page.

Meta-metadata

Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata". What is machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though.

Metadata and the Law

United States

Problems involving metadata in litigation in the United States are becoming widespread. Courts have looked at various questions involving metadata, including the discoverability of metadata by parties. Although the Federal Rules of Civil Procedure have only specified rules about electronic documents, subsequent case law has elaborated on the requirement of parties to reveal metadata.

See also

References

  1. ^ Hoberman, Steve, Data Modeling Made Simple, 2nd Edition, Technics Publications, LLC, 2009, page 313
  2. ^ Liddell & Scott, An Intermediate Greek-English Lexicon, OUP, pp. 500ff.
  3. ^ James Martin, Strategic Data Planning Methodologies, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1982, p.127
  4. ^ American Library Association, Task Force on Metadata Summary Report, June 1999
  5. ^ D. C. A. Bulterman, Is It Time For a Moratorium on Metadata?, IEEE MultiMedia, Oct-Dec 2004
  6. ^ William R. Durrell, Data Administration: A Practical Guide to Data Administration, McGraw-Hill, 1985
  7. ^ Bretherton, F. P. and Singley, P. T. 1994, Metadata: A User's View, Proceedings of the International Conference on Very Large Data Bases (VLDB), 1091-1094
  8. ^ David Marco, Building and Managing the Meta Data Repository: A Full Lifecycle Guide, Wiley, 2000, ISBN 0-471-35523-2
  9. ^ David C. Hay, Data Model Patterns: A Metadata Map, Morgan Kaufman, 2006, ISBN 0-12-088798-3
  10. ^ R. Todd Stephens (2003). Utilizing Metadata as a Knowledge Communication Tool. Proceedings of the International Professional Communication Conference 2004. Minneapolis, MN: Institute of Electrical and Electronics Engineers, Inc.
  11. ^ Sualeh Fatehi. "SchemaCrawler". SourceForge.
  12. ^ Ralph Kimball, The Data Warehouse Lifecycle Toolkit, Wiley, 1998, ISBN 0-471-25547-5
  13. ^ Guy V Tozer, Metadata Management for Information Control and Business Success, Artech House, 1999, ISBN 0-89006-280-3
  14. ^ Adrienne Tannenbaum, Metadata Solutions: Using Metamodels, Repositories, XML, and Enterprise Portals to Generate Information on Demand, Addison-Wesley, 2002, ISBN 0-201-71976-2
  15. ^ http://www.odl.ox.ac.uk/metadata.htm http://www.cs.cornell.edu/wya/DigLib/MS1999/Chapter4.html