File Matadata Tool Use Case
File Matadata Tool Use Case
Introduction:-
The Tool was designed for preservation processes and activities, but can be used to for other
tasks, such as the extraction of metadata for resource discovery.
The Metadata Extract Tool includes a number of 'adapters' that extract metadata from
specific file types.
For more information about any of these file formats see the Solution Architecture or
Software Architecture documents for this extraction tool. Note: The POC output
types of demta.dtd and pmeta.dtd have been deprecated, they are not supported in the
production tool.
Then change into the Metadata Harvester’s directory and run metadata.bat
(Windows) or metadata.sh (Linux/Unix).
Graphical User Interface
The software will be
driven by a graphical user
interface and a command
line interface. Both will
access the underlying
extraction tool in an
identical way (using the
same configuration). The
graphical user interface
will allow the user to select
files for processing and
process them according to
a predefined configuration.
The
predefined configurations
(see: configuration) will
specify the output format
and the output directory
that will hold the extracted
files. The user
interface is designed to be fully compliant with current Win32 best practice for GUIs.
One of the features of the User interface is the capability to alert the user to errors that
may have occurred during the extraction process.
The dialog you will be presented with looks like the image shown here. You can edit
any property of an object using this dialog. You may notice some fields go ‘red’
while typing – this is because the field validation has determined that the field
contents are incorrect.
The process button will process all files currently in the files list. The results of the
processing will be output to the directory specified by the currently selected Config.
Any errors will be highlighted with a red exclamation point icon appearing as the icon
of the erroneous file. Files that were processed without any errors are given a ‘tick’
icon and files that are yet to be processed are given plain icon. The Graphical User
Interface is multithreaded – this means that processing can take place and the UI does
not appear to “lock-up”.
Extraction Configuration
The available configurations are listed in a drop down box labelled “Config” these
configurations can each have a different output directory, which is indicated
immediately to the right of the drop down box.
Destination Folder
The destination folder that output files will be sent. You can change this setting by
clicking on the folder icon to the right of the text field. The field itself is not editable.
Profile Used
The profile is the set of parameters to use. Profiles are created and maintained using
the administration tool and include settings for which adapters are current and log
directory, etc.
Critical.
The application has
had a critical failure,
harvesting could not be
considered unstable and the
application should be
restarted.
Error.
An error is an
application problem or a
problem while harvesting metadata that is isolated to the object being harvested.
Chances are other objects were unaffected and harvesting can continue.
Debug messages
Information about program behaviour, there should be very little of these
messages in the production system.
Information message
Superfluous information about application behaviour. This includes things
like usage reporting.
Program Workings
Similar to debug, these messages are closely related to system functions – they
may not be very meaningful to most operators, there should be very little of these
messages in the production system.
Administration
The administration screens allow configuration of all aspects of the harvesting
environment. Administration screens are divided into a number of tabs. While
changes are generally saved when OK is selected, some administration tasks are
implemented immediately due to their complex nature. This is explained on each tab
description below:
General
Create Profile – Simply type the new profile name in the combo box and
select Create and a new profile of that name will be created with the normal
default values.
Delete Profile – Select the profile to delete and click the Delete button.
It is not possible to delete the last (default) profile.
Input Directory – Select the directory used as the standard directory to
look for files to harvest. It is therefore the starting point of any file selection.
Admin Tabs
Some of the less common administration tabs are grouped under an Admin tab. An
incorrect configuration can cause the Metadata Harvester to fail.
Defaults
Mappings
Delete Mapping – Clicking the Delete button will result in the currently
selected mapping to be deleted.
Page 10 of 13
Configuration
Add Configuration –
Click Add and enter the
name, harvester class, and
output directory for the new
configuration.
Delete Configuration –
Clicking the Delete button will result in the currently selected configuration to be
deleted.
Page 11 of 13
Adapter Maintenance
NOTE: Choosing cancel from the admin screen itself will not result in the adapter not
being installed!
Delete Adapter – Clicking the Delete button will result in the currently
selected adapter to be uninstalled from the system. A backup of the original jar
installation file will still be found in the system jar directory.
Help
Help is very simple. It is
sufficient for basic usage but is
neither comprehensive or
context sensitive.
Page 12 of 13
Command Line
The command line is executed by running the extract.bat file (under Windows) or
extract.sh (under Linx/Unix) that can be found in the installation directory.
see the java runtime environment documentation for particular arguments that can be
made to allocate additional memory and use different JIT compilers.
1. Help: The help mode produces a screen of information describing the modes
and parameters that
are available.
2. Self-discovery:
Self-discovery can
produce a screen
that can inform the
user about the
configurations
installed in the
extraction tool as
well as information
on available xslt
maps and the types
of adapters installed.
3. Extraction: Extraction is the mode that actually causes the tool to parse files
and output results. You must tell the extraction tool which configuration to
use (by name, use quotes for names with spaces) and the directory to find the
digital master files to be processed.
The command line interface is not multithreaded, meaning it will block until finished.
Page 13 of 13