T5-Digital Thesauri
T5-Digital Thesauri
T5-Digital Thesauri
CONTENT INDEX
1. Thesaurus concept
2. Thesaurus composition
3. Thesaurus typology
4. Standards for thesauri
5. Methodology for the preparation of thesauri
6. Maintenance and updating of thesauri
7. Digital thesauri
THESAURUS CONCEPT
A FIRST DEFINITION
- A thesaurus (plural, thesauri), also known as synonym dictionary, is a reference work for nding synonyms,
and sometimes antonyms, of words.
- It is a tool aimed at nding the words that most accurately and appropriately express an idea.
- Forms of organization:
o Systematic presentation: hierarchical taxonomy of concepts.
o Alphabetical presentation.
o Graphical presentation: tree, network or arrow diagram.
- A thesaurus is a controlled and formally structured vocabulary, made up of terms that have semantic and
generic relationships between them: equivalence, hierarchical and associative. It is an instrument of
terminological control that allows converting the natural language of documents into a controlled language,
thus univocally representing the content of documents, in order to serve for indexing and document retrieval
(LAMARCA, 2013).
- LEVÉRY (1976) formulated one of the most concise de nitions of the thesaurus, asserting that it is a
bridge between the language of the informed (the documentalist) and the language of the uninformed (the
user)".
THESAURUS COMPOSITION
fi
fi
fi
THESAURUS COMPOSITION
Lexical units
o Descriptors (preferred terms): are words or a group of words retained in the thesaurus and
chosen from a group of equivalent terms. These are authorized and formalized terms in a thesaurus, which
are used to unambiguously represent the concepts contained in documents and in information retrieval
requests.
• Single terms: are used when the concept is clear in itself, without the need to add
any other words. Ex.: photography.
• Compound terms: are used when it is necessary to use several terms (e.g.
adjective + noun) to precise or specify a concept. Ex: digital photography.
o Non-descriptors (non-preferred terms): are words included in the thesaurus, which belong to a
list of synonyms or quasi-synonyms and related terms linked to the descriptors by a semantic equivalence
relationship, which are likely to appear in the documents or in the requests, but which are not used to
formulate the query to the system. It is intended these terms improve the coherence of the representation of
the documents or of a query by sending us to the indexing term.
Semantic relationships
o Associative: It indicates relationships or links to the meaning of two descriptors. They are
symmetrical relationships between two descriptors, which are likely to evoke each other by reciprocal
association of ideas. Represented by RT (related term).
Example: Scienti c information RT Open science
o Hierarchical: It is the vertical relationship between all the descriptors of the same class,
expressed in terms of the subordination of the concepts (one term is superior or generic to another).
Represented by BT (broader term), NT (narrower term). They can be categorized as:
o Generic/speci c (a type of, or a class of). Example: Vertebrates (BT) →
Birds (NT)
o Whole/part (a part of). Example: Spain (BT) → Region of Murcia (NT)
o Enumerative (a case of). Example: Operating System (BT) → Microsoft
Windows (NT)
o Polihierarchy (a term falling into two categories). Example: Wind
instruments (BT) → Organ (NT) | Keyboard instruments (BT) → Organ (NT)
ISO 25964-1:2011. Information and documentation. Thesauri and interoperability with other
vocabularies Part 1: Thesauri for information retrieval (will be replaced by ISO/AWI 25964-1, now
under development).
- It gives recommendations for the development and maintenance of thesauri (monolingual and multilingual)
intended for information retrieval applications. It is applicable to vocabularies used for retrieving information
about all types of information resources, irrespective of the media used (text, sound, still or moving image,
physical object or multimedia) including knowledge bases and portals, bibliographic databases, text,
museum or multimedia collections, and the items within them.
- It also provides a data model and recommended format for the import and export of thesaurus data.
ISO 25964-2:2013. Information and documentation. Thesauri and interoperability with other
vocabularies. Part 2: Interoperability with other vocabularies.
- It is applicable to thesauri and other types of vocabulary that are commonly used for information retrieval.
It describes, compares and contrasts the elements and features of these vocabularies that are implicated
when interoperability is needed. It gives recommendations for the establishment and maintenance of
mappings between multiple thesauri, or between thesauri and other types of vocabularies.
ANSI/NISO Z39.19-2005 (R2010). Guidelines for the Construction, Format, and Management of
Monolingual Controlled Vocabularies.
- “Presents guidelines and conventions for the contents, display, construction, testing, maintenance, and
management of monolingual controlled vocabularies. It focuses on controlled vocabularies that are used
for the representation of content objects in knowledge organization systems including lists, synonym
rings, taxonomies, and thesauri”.
fi
fi
fi
fi
fi
METHODOLOGY FOR THE PREPARATION OF THESAURI
How do you build a thesaurus? (I)
• Gather terms from as many sources as possible (e.g., users, subject experts, documents, other existing
knowledge organization systems, etc.).
• Entry terms should include synonyms and abbreviations, acronyms, and alternative spellings for all of the
important concepts in your document collection.
• De ne the preferred terms.
• Create guidelines for selecting preferred terms. For example, in a collection of health-related documents
that include terms such as cancer, oncology, skin, and dermatology, make a decision to select medical
terminology or regular English as the preferred terms, according to you primary audience.
• Whichever terminology you choose, it's important to be consistent in your approach to de ne the
preferred terms.
• Link synonyms and near-synonyms. This is where you map the synonyms, abbreviations, acronyms, and
alternate spellings as "variant terms" to the preferred terms.
• The more entry terms you have, the easier it will be for indexers and users to nd the preferred terms.
• Group preferred terms by subject. This forms the foundation of your thesaurus hierarchy.
• De nition of the subject hierarchy should be informed by a balance of top-down considerations (e.g.,
mission, vision, intended audiences) and bottom-up content analysis.
DIGITAL THESAURI
THESAURUS.COM, UNESCO THESAURUS, VISUAL THESAURUS, VISUWORDS
SKOS (Simple Knowledge Organisation System) is a W3C initiative in the form of an RDF application that
provides a model for representing the basic structure and content of conceptual schemas such as subject
heading lists, taxonomies, classi cation schemes, thesauri and any kind of controlled vocabulary.
SKOS is a W3C standard that provides a set of terms, classes and properties to describe concepts and
relationships between them, enabling the creation of interoperable controlled vocabularies. Some of the key
terms in SKOS include "Concept," "Label," "Broader," "Narrower," "Related," and "Exact Match," among
others. Because SKOS is based on the Resource Description Framework (RDF) these representations are
machine-readable and can be exchanged between software applications and published on the World Wide
Web.
fi
fi
fi
fi
fi
fi
fi
fi
fi