Aristotle Data Model
Aristotle Data Model
Matthew Lawler
/conversion/tmp/activity_task_scratch/576228854.docx 1 of 20
Matthew Lawler [email protected] Aristotle Data Model
Introduction...........................................................................................................................................3
Licence...............................................................................................................................................3
Warranty...........................................................................................................................................3
Purpose.............................................................................................................................................3
Audience............................................................................................................................................3
Approach...........................................................................................................................................3
By.......................................................................................................................................................3
Acronyms...........................................................................................................................................3
References.........................................................................................................................................4
Metadata Repository (MDR)..................................................................................................................5
How does an MDR provide economic benefits?................................................................................5
Audit: Systems/database support or practicing what is preached....................................................6
Audit: Open Government..................................................................................................................7
Aristotle Metadata Repository (MDR)...................................................................................................8
Grammatical Quality..........................................................................................................................8
Additional Invariants and checks.......................................................................................................9
Building up the Aristotle MDR.........................................................................................................10
Other ideas......................................................................................................................................10
Aristotle Metadata Views................................................................................................................12
Aristotle Any Item Metadata...........................................................................................................12
API...................................................................................................................................................13
MDR Definitions...............................................................................................................................13
/conversion/tmp/activity_task_scratch/576228854.docx 2 of 20
Matthew Lawler [email protected] Aristotle Data Model
Introduction
Licence
This document is released under the Creative Commons Zero licence or CC0.
Warranty
The author does not make any warranty, express or implied, that any statements in this document
are free of error, or are consistent with a particular standard of merchantability, or they will meet
the requirements for any particular application or environment. They should not be relied on for
solving a problem whose incorrect solution could result in injury or loss of property. If you do use
this material in such a manner, it is at your own risk. The author disclaims all liability for direct or
consequential damage resulting from its use.
Purpose
This document describes the data model underlying the Aristotle Metadata Registry (MDR).
Audience
The primary audience for this document are metadata designers. This applies especially to designers
who need to integrate Aristotle with other metadata tools. The reader needs to understand
metadata concepts.
Approach
This document presents an analysis of an instance of the Aristotle MDR. This analysis was based on
profiling the extracted JSONs from the cloud. This is not a comprehensive document. As each
Aristotle instance is different, so the population and usage of the metadata will also be different.
By
This was written by Matthew Lawler.
Acronyms
This is a list of acronyms used in the document.
/conversion/tmp/activity_task_scratch/576228854.docx 3 of 20
Matthew Lawler [email protected] Aristotle Data Model
References
By For Path Full
Aristotl API https://aristotle.cloud/api/v4/
e
Aristotl Cloud https://www.aristotlemetadata.com/
e
Aristotl Source https://github.com/aristotle-mdr/aristotle-metadata-
e registry
Aristotl wiki https://en.wikipedia.org/wiki/Aristotle_Metadata_Registry
e
IAASIST Conference https://iassistdata.org/
ISO 11179 https://www.iso.org/obp/ui/#iso:std:iso-iec:11179:-1:ed-
3:v1:en
W3 Standard https://www.w3.org/TR/owl-ref/#sameAs-def OWL
Matching
standard
Wiki Standard https://www.wikiwand.com/en/Data_element_definition Wikiwand
API
Haskell was used to access the API. All code is here: https://github.com/lawlermj1/Aristotle-JSON
/conversion/tmp/activity_task_scratch/576228854.docx 4 of 20
Matthew Lawler [email protected] Aristotle Data Model
These can be
1. Definition source
2. Initial Requirements
The first 3 won't be examined as they are self-evident use cases. The Auditing use cases will be
looked at below.
/conversion/tmp/activity_task_scratch/576228854.docx 5 of 20
Matthew Lawler [email protected] Aristotle Data Model
If supported % is high, then there is a good fit between documents and databases. That is, the
database would be justified as it supports the relevant Act.
If Unsupported % is high, then there is a database or systems capability gap, which could be used to
justify additional projects.
If Additional % is high, then these words are either hidden or represent too much systems capability.
/conversion/tmp/activity_task_scratch/576228854.docx 6 of 20
Matthew Lawler [email protected] Aristotle Data Model
If hidden, then this can be made public, or reduce unneeded capability. It could also lead to the
discovery of legislative or regularity gaps. Then the information could be added to the legislation, or
the capability turned off on efficiency grounds.
Obviously, this is not a detailed capability assessment. It is just a check to see if the language used in
the systems is consistent with the primary documents. Further, more detailed analysis is required.
/conversion/tmp/activity_task_scratch/576228854.docx 7 of 20
Matthew Lawler [email protected] Aristotle Data Model
Aristotle has three core objects: Object Class, Property and Data Element Concept. However, these
three have quite complicated definitions that are hard for anybody to understand. These are a case
study of making definitions needlessly complicated. It is a case of accidental complexity, which leads
to spaghetti metadata. A key insight is that these core objects are really Nouns, Adjectives and
Phrases. That is, it is as simple as basic grammar. In effect, SA is building a Corpus of its words. To
restate, an Object Class is a Noun or group of Nouns, a property is an Adjective, Adverb or Verb, and
Data Element Concept is a Phrase formed by its parent Noun and Adjective. The word Adnominal
means either Adjective, Adverb or Verb. That is, an Adnominal is a modifier to convert a Noun into a
Noun Phrase.
As with other parts of the ISO 11179 standard, the grammatical specification is incomplete.
However, even with a limited number of Parts of Speech (POS), it can still be useful.
Grammatical Quality
See POS Tagging.
/conversion/tmp/activity_task_scratch/576228854.docx 8 of 20
Matthew Lawler [email protected] Aristotle Data Model
In a typical corpus, the ratio of Nouns to Adnominals is about 3 to 1. In this example, the ratio of
Nouns to Adnominals is 1 to 9, as the Object Class count is 899 and the Property count is 7,674. So,
what is awry? The next step was to parse names into words, and attach a Part of Speech (Noun,
Adjective, etc) tag to each word. This reveals that each Object Class on average uses 3 Nouns for
each Adnominal, which is tolerable. However, each Property on average uses 2 Nouns to 1
Adnominal. Simply stated, there are too many Nouns in the Property set, and each Property's nouns
should be in the Object Class. In addition, about 4% of Property words are misspelt, which will make
it difficult to search to find them. Finally, this is quite a small corpus. It is reasonable to assume that
a Corpus would have at least 10,000 nouns.
% Nouns in OC vs non-nouns in OC
% Adjectives in P vs non-adjectives
% Of missing implied words, especially base words from phrases not yet included.
/conversion/tmp/activity_task_scratch/576228854.docx 9 of 20
Matthew Lawler [email protected] Aristotle Data Model
Currently, this is done manually, without automated checks, which has produced very mixed quality.
1. Collect all words from all published documents on the web site or the governing Act. These
should represent a full set Nouns, Adjectives, phrase of interest to the organisation.
2. Insert or Post these can be stored into Object Class, Property and Data Element Concept corpus.
The Distribution object can be used to populate the file or web or document sources.
3. Collect definitions of words from online dictionaries, such as the Governing Act, OED,
Macquarie Dictionary etc. The purpose of Australian Government Agency is defined in the relevant
enabling Parliamentary Acts. Included in these Acts are definitions that apply to the Act, and
anybody administering these Acts. So, by definition, these definitions are superior to all others.
4. Insert or Post these into Object Class, Property and Data Element Concept corpus.
5. Add a spell check and grammar check to these and allow misspellings with a link back to the
correct form.
6. Collect metadata from all available systems and populate into a separate corpus, along with
the distribution.
7. Match the 2 corpuses to determine overlap. These can be used to determine the metadata
completeness of systems supporting the organisation corpus.
Other ideas
A. Acronyms can be up to 20% of the words used in a namespace. These can be treated as a
Phrase type and placed into the Data Element Concept.
B. Phrases can be made up other Phrases. So, there is a need to provide a recursive link on this
object, as an additional relation. This will allow Nouns and adnominals to remain a primitive words,
and not compounds. These relations should conform to standard English Grammar. Where they did
not conform, this could be a means to identify errors.
C. Additional relations will be needed between Value Domains and Distributions directly to
Object Class and Property as a way of providing traceability. the html layer does not support this,
but it could easily be implemented in the graph DB layer.
/conversion/tmp/activity_task_scratch/576228854.docx 10 of 20
Matthew Lawler [email protected] Aristotle Data Model
D. Historical, Superseded or archaic words; these are words used previously and have since been
replaced. There needs to be come traceability of these terms.
E. Namespaces: Words that are unique to a particular area. These are often not understood outside
a particular area of expertise. These need to be captured, and clearly placed into a domain. See
diagram.
F. In the implementation, there are too many workgroups and not enough visibility onto objects like
relations.
G. Aristotle does not clearly support the standard security model of UNCLASSIFIED, OFFICIAL,
UNOFFICIAL and RESTRICTED. Rules should be defined that apply to all Aristotle items.
H. Item State (candidate, recorded, etc) is not available to an API user, with standard permissions.
/conversion/tmp/activity_task_scratch/576228854.docx 11 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 12 of 20
Matthew Lawler [email protected] Aristotle Data Model
MDR Definitions
As a general comment, this document describes the implied data structure of a tool which manages
metadata data. This leads immediately into a language confusion trap, and the classic ‘name
collision’ problem. The MDR implements the ISO 11179 which is incomplete. The gaps are filled
with OOP terminology, so some confusion is inevitable. The major name collisions are Attribute,
Class, Object, Property and the woefully named Object Class.
Profiling clarified definitions as examples are always useful in understanding abstract ideas such as
metadata. Aristotle has not provided definitions for all JSON formats, so I derived these from
profiling. These definitions have a Type of JSON. The Count column indicates the number of JSON
records extracted when in late 2021. The Unused column indicates Aristotle concepts that are not
used in the sample instance. Therefore, they will not be discussed further.
/conversion/tmp/activity_task_scratch/576228854.docx 13 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 14 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 15 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 16 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 17 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 18 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 19 of 20
Matthew Lawler [email protected] Aristotle Data Model
/conversion/tmp/activity_task_scratch/576228854.docx 20 of 20