White Paper System Architecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

White Paper

System Architecture

Version 1.2

February 2010

DocuWare AG

Therese-Giehse-Platz 2

82110 Germering, Germany


Legal notice:

DocuWare AG

Therese-Giehse-Platz 2

82110 Germering, Germany

Telephone: +49.89.89 4433-0

Fax: +49.89.841 9966

E-mail: [email protected]

Disclaimer:

This document was compiled to the best of our knowledge and with great care. All
references are to DocuWare products starting with DocuWare version 5.1c.
Essentially, this white paper sets out to describe the basic technical structure of the
DocuWare products. There may be small or temporary differences with respect to
individual functions in a particular version.

© Copyright 2010 DocuWare AG. All rights reserved.

2
Contents

Contents
1. Objectives of This White Paper ....................................................................... 5

2. Future Requirements ........................................................................................ 6

3. System Architecture - Overview ...................................................................... 8


3.1. Design Requirements .............................................................................................................. 8
3.1.1. Requirements from the perspective of the provider ............................................................................. 8
3.1.2. Requirements from the perspective of the user ................................................................................. 10

3.2. N-Tier Architecture ................................................................................................................ 12

3.3. DocuWare System Architecture ........................................................................................... 13

3.4. Operating Systems and System Requirements .................................................................. 15


3.4.1. Client systems ................................................................................................................................... 15
3.4.2. DocuWare Servers ............................................................................................................................ 15
3.4.3. Infrastructure components ................................................................................................................. 16
3.4.4. Terminal server .................................................................................................................................. 16

3.5. Summary ................................................................................................................................. 16

4. Authentication Server ..................................................................................... 17


4.1. Passwords .............................................................................................................................. 18

4.2. Login to LAN/VPN .................................................................................................................. 18

4.3. Login via Internet ................................................................................................................... 19

4.4. Authorization Concept .......................................................................................................... 19


4.4.1. Roles ................................................................................................................................................. 19
4.4.2. Profiles............................................................................................................................................... 20
4.4.3. Users and groups .............................................................................................................................. 20

5. Content Server ................................................................................................ 21


5.1. File Cabinet ............................................................................................................................. 22

5.2. File Structure .......................................................................................................................... 23

5.3. The "Disk" Concept ............................................................................................................... 24

5.4. Supported File Storage Media .............................................................................................. 25


5.4.1. Hard disks, RAID ............................................................................................................................... 25
5.4.2. Optical removable disks..................................................................................................................... 25
5.4.3. Jukeboxes ......................................................................................................................................... 26
5.4.4. Content Addressed Storage (CAS) .................................................................................................... 26
5.4.5. NetApp Storage ................................................................................................................................. 26

5.5. Header File .............................................................................................................................. 27

5.6. Metadata.................................................................................................................................. 28

5.7. Document ................................................................................................................................ 29

3
Contents

6. Databases ........................................................................................................ 30
6.1. Database Structure ................................................................................................................ 30

6.2. Integrated Database ............................................................................................................... 31

6.3. Direct Database Connection ................................................................................................. 31

6.4. Database Administration ....................................................................................................... 31

7. Web-Based Applications ................................................................................ 32


7.1. Document access via Web Client ......................................................................................... 32
7.1.1. Web Client Server.............................................................................................................................. 32
7.1.2. Imaging Server .................................................................................................................................. 32
7.1.3. Thumbnail Server .............................................................................................................................. 33
7.1.4. Web instances ................................................................................................................................... 33
7.1.5. Web Client ......................................................................................................................................... 33
7.1.6. ClickOnce applications ...................................................................................................................... 34
7.1.7. Silverlight Plug-In for Web baskets .................................................................................................... 34
7.1.8. Integration of Web Client in other applications .................................................................................. 34

7.2. Web-Based Administration ................................................................................................... 35

8. Management Framework Process ................................................................. 36


8.1. Workflow Server ..................................................................................................................... 36

8.2. Pre-defined Batch Processes ............................................................................................... 37

9. Full-Text Index................................................................................................. 38
9.1. Functional Principle ............................................................................................................... 38

9.2. Full-Text Tables and Files ..................................................................................................... 39

10. Distributed and Redundant Archives ............................................................ 40


10.1. Satellite Archives ................................................................................................................... 40

10.2. Mobile Users ........................................................................................................................... 40

10.3. Autonomous File Cabinets ................................................................................................... 41

11. Integration ....................................................................................................... 42

12. Scalability ........................................................................................................ 44


12.1. Clustering and load distribution........................................................................................... 44

12.2. Other performance measures ............................................................................................... 45

13. Glossary .......................................................................................................... 46

4
Objectives of This White Paper

1. Objectives of This White Paper


This White Paper is the first in a series of documents describing the architecture of the
DocuWare system for the benefit of readers who are interested in the underlying
technologies and the way they are used by the DocuWare system.

This will enable the technically minded reader to form an opinion about the DocuWare
system and to assess its power in terms of flexibility, scalability and performance when
handling current requirements. The paper includes a discussion of the measures undertaken
to achieve access security and to prevent down-times – or at least to minimize their adverse
effects on users. Another topic we will cover is integration. This will give the reader an idea of
how the DocuWare system behaves within an IT environment that it shares with other
systems, and to what extent customizations may be required in order to ensure maximum
return on investment and minimum administrative costs (total cost of ownership).

The White Paper addresses clients (users), consultancy companies, IT magazines and
distribution partners. It assumes a certain level of technical knowledge about the structure of
modern software applications, ideally of document management systems. Detailed
knowledge of current or previous DocuWare systems is not required.

As this White Paper is the first in a series, it attempts to provide an overview of the total
architecture. There are other White Papers on the subjects of Security and Integrations.

5
Future Requirements

2. Future Requirements
DocuWare is one of the leading developers of document management systems, not just in
Germany, but also worldwide. This undoubtedly is the most important and best proof of the
quality and performance of the company's systems. One of the critical success factors is the
simplicity of the system’s installation, operation and administration.

Thanks to this success DocuWare systems are increasingly being used in larger and more
complex installations. This White Paper will show that technically the DocuWare system is
well suited for larger and more complex environments and as such constitutes a solid
foundation for any future needs.

In the competition for larger and more complex installations, DocuWare is measuring up
against a different set of rivals, who mostly "externalize" this complexity. DocuWare on the
other hand is intent on retaining its proven success factors and to continue to be a leader in
terms of simplicity of installation, operation and administration.

Even though the overriding need continues to be for conventional archiving systems, market
trends are inevitably moving towards an "Integrated Document Management (IDM)“, and in
the longer term even towards "Enterprise Content Management“ systems.

While we may not yet have arrived at a precise definition of Enterprise Content Management
(ECM), the needs of IDM are by now largely established – and the DocuWare systems
already go a long way to cover this.

Figure 1: Areas of application and implementation of IDM requirements in DocuWare

Integrated document management must be independent of “time and space." This means it
must be available everywhere and at all times, regardless of whether the user is at company
headquarters, at a branch office, at a client site, or in his office at home. It also means that
documents do not necessarily have to be stored at the location where the documents
originate and that the documents are available irrespective of the location at which they were
archived.

6
Future Requirements

Additionally, clients may have very different needs. Some of them are faced with enormous
volumes of documents that may need to be captured and stored, even though they may
seldom be accessed. At the other end of the spectrum there may be clients with relatively
small volumes of documents that are accessed by large numbers of users from various
locations on a constant basis.

It follows that large/complex installations systems must be differentiated and assessed for
suitability above all on the basis of their system architecture. With this in mind, the following
evaluation criteria are important:
 Administration
Simple and coherent administration for the entire system in order to reduce maintenance
costs, part of the Total Cost of Ownership (TCO).
 Scalability
In order to meet the requirements it may be necessary to implement one large system
spanning several sites, or it may be that several, smaller installations are better suited to
particular organizational and technical needs. Whatever the case may be, mobile users
must have the option of transporting subsets of the archive on their notebooks.
Clearly, the intention is not to cater for such different requirements by providing different
systems, but to cater for different needs with different expansion stages of the same
technology.
 Security
In the context of an archiving system ("File cabinet"), security with all its facets, is a
critical consideration. For one thing, the basic need for revision-proof archiving brings
with it the necessity to prevent data loss in case of system failures. If the client depends
on the availability of the system – which is increasingly the case – continuity becomes
ever more important.
In addition, the ability to map organizational competencies and permissions is of great
importance. To safeguard security it must be possible to restrict user access to
functionalities and to data in a flexible manner that matches organizational needs.
 Integration capability
The all-important criteria in terms of integration in today's complex and heterogeneous IT
landscapes are the availability of interfaces, the possibility to integrate existing IT
infrastructures, the conformity to standards and the openness towards system internals.
 Migration capability
As is well known, information technology has extremely short innovation cycles, while
archives ("file cabinets") often have very long life spans. Consequently, migration is very
much part of the abovementioned integration. Additionally, there are compatibility
requirements in terms of system generations and migration tools that need to be
considered.
These topics, which are very important, fall outside the scope of this White Paper which
concentrates on the system architecture as a whole, but they will be covered in greater detail
by other White Papers on the subjects of Security and Integration Capability.

7
System Architecture - Overview

3. System Architecture - Overview


3.1. Design Requirements
The architecture of the DocuWare system was specifically designed to provide a stable
foundation, both for the current functionality and for future requirements.

In order to be able to deliver the full IDM functional spectrum in large and complex
installations and to meet the listed requirements in terms of scalability, integration capability
etc., the design criteria were broken down into
 requirements from the perspective of the provider
 requirements from the perspective of the user.

3.1.1. Requirements from the perspective of the provider

Providers want a system that


 Offers client capability and support for ASP models (Application Service Provider)
 Is suitable for operation in external computer centers
 Supports browser-based Web access
 Supports multilingual regions
 Supports complex installations across multiple sites
 Supports very large, revision-proof document storage systems
 Offers openness vis-à-vis system and storage technologies
 Consolidates and extends workflow and automation features within a process
management framework
 Provides administrative support for all modules with a single tool
 Functions optimally in the Microsoft environment
 Optimally integrates database technologies of leading developers, independent of OS
As these requirements were instrumental for the design of the system architecture, they are
described in somewhat more detail below.

Multi-client capability and ASP support

DocuWare systems are fully multi-client enabled so that an outsourcing provider can run one
system for a number of clients. Users, storage locations, functional modules, language
support, etc. can all be defined independently for each client, without having to take into
consideration the settings of the other clients. This also means that several, totally different,
archives for different clients using different storage structures and/or different storage
technologies can all be run on one and the same DocuWare system.

To this effect, various different "organizations" are defined in Administration, where each
"organization" represents one particular client. In addition, for each client you can define a
separate administrator with appropriate rights. In other words, it is possible to set up client-
specific configurations in parallel.

A synchronized procedure is required only for changes in the hardware and the basic
configuration, which affect the entire system, for example the type and number of servers.
This is handled by a separate "system administrator" who is authorized to carry out such

8
System Architecture - Overview

system changes but whose rights might be restricted when it comes to the clients' individual
databases. A log allows monitoring usage and guarantees security.

Additionally, the in-built log provides a basis for invoicing when operating the ASP model.
DocuWare providers thus have the option of running several different DocuWare
configurations for multiple clients.

Web access

Web Client provides DocuWare with the option of accessing DocuWare file cabinets via the
Internet/Intranet/Extranet. All essential features, such as opening documents, marking with
notes and stamps, changing index words and storing documents, are available irrespective
of the workstation and without installation on the client computer.

Multi-language support

For DocuWare AG as an international company, support for the most widely used Romance,
Germanic, Slavic and languages such as Japanese and Arabic is of major importance. This
doesn't stop at localizing the user interface, it also involves adding support for various
number and date formats.

DocuWare uses Unicode on its servers and on new client components. The Unicode
character set (UTF8) is used both for the interface and for data management. This makes it
possible to manage documents and their associated index data in various languages
(including Asian ones) within one and the same archive. The same applies for the full-text
index feature.

Location independence

For systems to provide an architecture within large organizations they must be capable of
operating across site boundaries. As a consequence, cross-site archives and sub-archives,
communication between components via WAN technology, remote administration and
synchronization mechanisms are all critical components of the architecture.

Large, revision-proof installations

Today's hardware makes it possible to store huge volumes of information. Many customers
have therefore amassed very large archives on which they do not want to impose any
software-induced restrictions. This is why every attempt was made to enable DocuWare
systems to handle any volume of documents while allowing full functionality, including
security features.

Offers openness vis-à-vis system and storage technologies

In order to comply with the requirement for optimal integration of the system into a
heterogeneous IT landscape, great emphasis was put on supporting and using existing de-
facto standards. This includes the use of existing directory, database and mail servers, but
also openness toward different storage technologies.

9
System Architecture - Overview

Process Management Framework

There are many tasks within the daily operation of a DMS that are completely routine, for
example copying documents from data sources. Similarly, users — especially in
administrative capacities — handle a range of repetitive processes that recur all the time.
This called for a powerful tool that would automate both system-internal as well as user-
oriented processes.

A modern document management system is expected to deliver a great deal more than just
collect and provide information. To achieve a high degree of automation, it must be possible
for the processes in place for capturing and processing documents to be defined in a flexible
manner. In addition, documents these days often control many of the workflows within
organizations. It is part of the remit of a document management system to electronically map
this process control by means of document workflow functions.

DocuWare provides the Workflow server for this purpose. This controls all automation
processes and acts as the workflow engine for the document workflow.

Provides administrative support for all modules with a single tool

Complex systems composed of many modules, interfaces and providing a multitude of


options tend to generate an exponentially increasing amount of administrative work. By
contrast, DocuWare stands for systems that are easy to install, operate and manage. And we
have every intention of not letting our clients down in this respect. Hence it was necessary to
provide a central administrative tool that could handle even large and complex installations.

Database technologies of leading developers

In order to embed the DocuWare system transparently and seamlessly into existing
infrastructures, it must be capable of being integrated with the database technologies of
leading developers independent of the operating system. This is not simply a question of
protecting a company's investment, but plays an important role in terms of administrative
efficiency.

Openness vis-à-vis different storage technologies

A number of different storage technologies have been competing with each other for a while.
This means that their relative strengths and weaknesses are undergoing constant changes.
By contrast, archiving systems are expected to provide efficient and secure storage facilities
over long periods. This is why openness vis-à-vis storage technologies, independent of
operating systems, is crucial in order to achieve continuity regarding security and efficiency
throughout the entire storage cycle.

3.1.2. Requirements from the perspective of the user

Even if the requirements from the perspective of the provider are essential to the system
architecture, the real benefit to the user depends on the functionality document management
system. At this point, we have provided a summary of the features to give an overview of the
requirements. For more detailed descriptions please consult the product literature and the
materials available on the DocuWare website (www.docuware.com).

10
System Architecture - Overview

Category Current coverage


Imaging Fully integrated scan client
Flexible integration of network scanners and digital copiers, with some direct
connectors
Integrated functions for image enhancement
Barcode recognition, zone-based and full-text OCR
Document classification
Document viewer offering high-quality display, a full feature set and support for
many formats, both in Web and Windows Client
Integration of high-end imaging and classification tools (Ascent Capture, VRS,
AnyDoc, etc.)
COLD/ERM High Performance COLD with efficient storage format
Flexible adaptation of spool files to classification rules
Quick import of spool files
Tiff printer as a powerful imaging component
Integration Powerful connector for SAP R/3
Archiving and research connector for Notes/Domino
Connector to SharePoint
Integration for mail archiving from Outlook and Exchange
Intuitive, menu-driven integration into practically all Windows applications
Integration of Web Client or individual elements of it in other applications via URL
Connectors for direct integration of multi-function copiers from several providers
Various programming interfaces for controlling DocuWare from other applications
The Integrations White Paper provides you with a detailed description of the
integration options from DocuWare
Document Full Viewer support for many formats of "coded information (CI)", including
management extensive comments features.
Automatic import and classification of CI files, including email and Office files.
Secure locking mechanisms for processing CI files with their source programs
Monitoring manipulation of CI files by checksum functions and electronic signatures
Check-in / Check-out for simple version management
Downloading or sending documents in original or PDF format
Records Revision-proof archiving of all kinds of CI and NCI documents (NCI = Non Coded
management Information)
Export and migration functions
Management and monitoring of retention periods
Flexible access provision by integrating network user directories
Logging user access to documents

11
System Architecture - Overview

Repository Support for all types of storage technologies


Universal database support, incl. Fulltext
Open, standard-based architecture
Fully scalable
Flexible access control
Electronic Signatures
Extensive administrative tools
Support for leading security and backup technologies
Windows Client Both clients easy to use and manage
and Web Client
Covers all imaging and document management functions with a coherent user
interface
Web client integrates seamlessly with practically all Web applications
Workflow Intuitive, rule-based document workflow for ad-hoc and production workflow
Outstanding user-friendliness achieved by using "stamp technology"
Automatic document batch processing
Other ECM Seamless integration of IDM functions into Web pages
Control of document delivery on the Web by means of standard IDM functions in
DocuWare
Open architecture of the document repository enables accommodation of all types
of digital assets

The next section describes how the above requirements were integrated into the system
architecture.

3.2. N-Tier Architecture


The architecture of the DocuWare system conforms to the N-Tier concept, which has evolved
from the client-server principle. Its main characteristics are:
 Features on the workstations are strongly dialog-oriented
 Application logic is located on one or more central DocuWare servers
 Several applications share common resources on one or more central background
servers
As in the classic client-server concept, the term server here refers to a software service, not
to a piece of hardware. A DocuWare system therefore invariably consists of several
(software) servers, all of which can – in extreme cases – simultaneously run on one
hardware system.

12
System Architecture - Overview

Figure 2: Basic product architecture

3.3. DocuWare System Architecture


You will now receive an overview of the architecture components. which will be described in
more detail in the following sections.

A DocuWare system contains at least the following software components:


 Windows Client
Dialog-intensive features are integrated in the client component on each workstation.
This provides optimum use of the advantages that the N-Tier architecture offers in terms
of user comfort and performance.
The Windows Client always comprises a scan client in order to provide this functionality
at the workstation itself (provided a scanner is available).
Providing a scan functionality at each client workstation is a response to the trend
towards decentralized scanning. The aim is to make it as easy as possible for individual
users to capture information.
 Authentication Server
The authentication server manages all resources and users. It is the central "control
station", which accepts logins, verifies authorizations, releases functions and resources
and allocates (for example) servers to users.
 Content Server
The content server manages the logical file cabinets. It uses the database to manage
index data and other comments associated with the documents. The documents
themselves are stored with the header file in the file system (see section 5 Content
Server).

13
System Architecture - Overview

Figure 3: Minimum system architecture

With this basic system as the starting point, the DocuWare system is expandable and
scalable in discrete steps. The next figure shows an example of how
 a process server can be added to give extra functionality
 to integrate Web Clients using Web Client Server
 separate hardware systems can be used for
 authentication and workflow servers on the one hand, and
 content servers on the other
 database and file store

Figure 4: Functionally expanded and optimized system

 Workflow Server
The workflow server controls all automation and workflow processes. Automation
processes include, for example, document import/export, file cabinet synchronization,
migration and fulltext indexing.
 Web Client Server and Imaging Server
Web Clients can be integrated via the Web Client Server, which in turn accesses the
Imaging Server. Users of these clients need a browser (Internet Explorer or Firefox) and
they can store, search for, display, mark with notes and stamps, etc. documents in
DocuWare file cabinets.

14
System Architecture - Overview

Communication between components occurs via standard protocols such as TCP/IP and
HTTP. This allows systems to be implemented across different sites using Internet
technology. If security is an important consideration, communication can also be realized via
VPN (Virtual Private Network).

Figure 5: File cabinets spanning several sites with Master (m) and Satellite (s).

This architecture not only allows reciprocal access to remote file cabinets but also the
creation of redundant file cabinets in order to be able to work on the same file cabinets
(archives) regardless of site and transmission capacity. Regardless of the file cabinet type
("master" or "satellite"), the full DocuWare functionality can be used at both sites, including
copying any documents. Synchronization between "master" and "satellite" takes place via
Workflow Server (see 8.1). The selected architecture therefore above all follows the
requirements for scalability across site boundaries to cover organizations that have a number
of branches in geographically different locations.

3.4. Operating Systems and System Requirements

3.4.1. Client systems

On the client side, all Microsoft Windows versions starting with Windows XP are supported.
This means that a Rich Client exists for these versions making available the full range of
features provided by the DocuWare system.

Users can access DocuWare using a Web Browser via the Web Client. To have full use of all
the features of this Web Client, you need the following: Windows XP or higher and Internet
Explorer from version 6 or Firefox from version 2. The Web Client can also be used with
Firefox on Mac or Linux systems. (For restrictions to the functional scope, see 7.1.6
ClickOnce applications and 7.1.7 Silverlight Plug-In for Web baskets.)

3.4.2. DocuWare Servers

The servers of the DocuWare system are implemented on the basis of Microsoft's .NET
architecture. Since their optimization for the Microsoft platform, both installation and
administration have become much easier, and performance has soared.

15
System Architecture - Overview

Although the DocuWare system comes with a number of its own servers, it does not require
a Windows server license, but only the "Windows engine." This makes the system very
economical, including for set-ups with several DocuWare servers.

DocuWare servers can therefore be run on all platforms supporting one of the Windows
versions XP/2003 or higher.

3.4.3. Infrastructure components

The basics of DocuWare are a database and a file cabinet. MySQL provides a powerful
database within the basic system. You can then use any Windows filing system as your file
DocuWare Server cabinet, for example the one on the content
server platform. Both tasks are typically
handled by dedicated, existing hardware
systems, which may also reside on non-
File Store User Directory Database Windows platforms. DocuWare can take
LDAP advantage of such resources.

Active
Directory

NT Domain
Figure 6: Open systems integration

3.4.4. Terminal server

Extensive tests were carried out to ensure that the DocuWare system runs on the Microsoft
terminal server and the Citrix Metaframe extensions. This means that using elementary
Windows stations in this environment is a perfectly viable option.

3.5. Summary
DocuWare systems can be integrated into existing IT landscapes without the need for
redundant installation and administrative expenditure, for example in terms of additional
databases or user management.

To recapitulate: a DocuWare system always consists of:


 one or more DocuWare clients
 an authentication server
 one or more additional DocuWare server modules.
There must be at least one content server. This content server always accesses
 one or more databases (which may reside on third-party servers)
 one or more file cabinets
For Web Clients and automated processes additional server modules are available for
functional expansions. The server functionality can be distributed across a number of
hardware units. Integration with non-Windows systems is possible.

16
Authentication Server

4. Authentication Server
Authentication Server manages all users and resources within the entire system. Before you
can use the system, you must always log on to Authentication Server.

Authentication Server handles the following tasks:


 user login
 license management
 administration of user-specific settings
In order for DocuWare to be multi-client enabled, users are allocated to "organizations,"
which are managed by the Authentication Server. An "organization" in this sense is a logical
structure comprising:
 users and user groups
 logical archives, incl. the associated hard disks
 Processes
 templates for stamps, recognition schemes for OCR (Optical Character Recognition) and
bar codes, select lists
 Logging
For each DocuWare system there is one and only one authentication server, which works
across all "organizations." To avoid down times or to better serve a very large number of user
requests, Authentication Server may be installed redundantly. This means that the
authentication server is used by
 one or more organizations each with
 at least one or more users
DocuWare uses internal user IDS rather than the login user names. Only these user IDs are
used as database keys. Users can therefore be renamed at any time without having to
modify the allocated settings.

Figure 7: Authentication

17
Authentication Server

During a user login, the authentication server also checks the licenses for the various
DocuWare servers which are available to that particular user. Both "concurrent licenses" and
"named licenses" are supported.

4.1. Passwords
Passwords are usually encrypted, or stored as hash values. The same applies to system
settings such as the login for the database server.

It uses the "salted" hash procedure, whereby a random value ensures that even two identical
passwords do not generate the same hash value. This means that passwords can neither be
read nor reproduced.

The login options are specified when a user is set up. User management is performed by the
Organizations Administrator.

4.2. Login to LAN/VPN


The following methods are supported for login:
 DocuWare login
Users must identify themselves by their user name and password as stored in
DocuWare. Users must only log in once, irrespective of the different DocuWare servers.
 Trusted Login (Single Sign-On)
Client identifies itself – without additional user input – via the login name of the Windows
operating system. Authentication Server checks the login by means of the Windows user
administration.
This method also permits cooperation with other single sign-on systems. The directory
services based on LDAP and Windows Active Directory are supported.
Login in DocuWare always takes place via the authentication server. The login procedure
also incorporates a verification of the licenses available to the user.

DocuWare uses a "ticket granting ticket" (TGT) whereby the user or client identify themselves
to the authentication server, request a service, are given a "ticket" and with this ticket can
then use the service of another server, for example of a content server. For the purposes of
identification the client needs "credentials" which, as mentioned above, it receives either
through user input (DocuWare login) or through the Windows user administration (trusted
login). Thus, the Authentication Server exerts the central control function over the sessions
within the system and can on the one hand impose the security features and on the other
react dynamically in case of failure or overload of individual servers.

The communication between client and servers and between servers takes place securely.
The supported protocols are NTLM and Kerberos. Due to the higher level of security it
provides, DocuWare servers optionally use Kerberos amongst themselves and try to use this
protocol also to communicate with external systems. Only in cases where the partner system
does not support this – e.g. older Windows versions – is NTLM used for compatibility
reasons.

18
Authentication Server

Figure 8: Ticket granting ticket procedure

The authentication steps involved in the ticket-granting-ticket procedure run in the


background and cause no perceptible delay for the user.

4.3. Login via Internet


The login for remote users over the Web works in essentially the same way as described for
LAN/VPN users, except that there is no direct communication between the Web Client and
the DocuWare servers. A Web Client server is interposed, which is hosted by IIS (Internet
Information Services), and possibly also a proxy server and firewalls, although these do not
affect the sessions described here.

4.4. Authorization Concept


Employees in large organizations deal with complex processes and are subject to a variety of
rules and regulations. In order to carry out their tasks they need authorizations to use
particular functions, sets of data and documents. This goes hand in hand with certain
restrictions to make sure that only authorized personnel have the right to do certain things,
and to maintain transparency for everyone.

4.4.1. Roles

In addition to "user groups," the DocuWare system also works with the "role" concept. This
involves defining "roles" to which authorization profiles (collections of rights) are then
assigned. A "role" therefore is a particular set of authorizations, not users. It typically
corresponds to the rights that are necessary in order to fulfill particular tasks within a
process.

By assigning roles to individual users or user groups these are automatically awarded the
authorizations that were previously defined in the profiles.

A role comprises one or more profiles defining the available features plus one or more
profiles specifying the access rights to stored documents (see "feature profile" and "file
cabinet profile" in the glossary). One or more roles can be assigned to a user or a group.

19
Authentication Server

Figure 9: Authorization concept (Bx = Authorization, Ux = User)

4.4.2. Profiles

A particular position or "role" in an organizational unit can entail quite different tasks – and
hence require a number of different authorizations. Which is why individual authorizations are
grouped into profiles. The role of the "chief buyer" for example might require the profile for
approving vacation requests as well as the profile for purchasing complex IT systems,
because both these tasks happen to fall within the competency of the chief buyer's position,
even though they are totally unrelated.

Generally speaking, there are two types of profile:


 file cabinet profiles, and
 function profiles
While file cabinet profiles are a collection of rights to a logical archive, function profiles map
the availability/unavailability of individual DocuWare functions.

4.4.3. Users and groups

Users are allocated roles according to their tasks within the organization. Typically, a user
will have a number of different roles, and in many cases, several users have overlapping
roles. Users can therefore be put into "groups."

A "group" is a set of users. Groups cannot contain subgroups. A user can be a member of
several groups. Users can also be allocated profiles and individual authorizations directly.

Since DocuWare allows the exchange with external user administration systems that may
typically also work with the "group" concept, these settings make it very easy to assign
DocuWare rights to external users.

20
Content Server

5. Content Server
Clients and other servers gain access to A and database information via Content Server.
Thus, Content Server is responsible for providing standard access, central control and
logging of the file cabinet utilization.

The various organizations that use a DocuWare system can use different Content servers
simultaneously. Content Servers are individually scalable, so that it is easy to distribute the
load within a DocuWare system optimally. All Content servers manage index and meta data
of the stored documents in one or more databases.

Moreover, the Content server manages a number of "logical file cabinets" to which
documents are allocated. It looks after all activities that access these archives whether they
involve storing documents or searching and retrieving them.

In order to facilitate the mapping to removable mass storage media, "logical disks" with
specific storage capacities are assigned to the archive. This makes it possible for documents
to be stored together physically so that they can then be swapped out, deleted or transported
more easily.

Figure 10: Logical file structure

These "disks" are located in a "storage location," which can be any file store that you may
choose. Different types of storage media are supported (see 5.4). Even within one file store it
is possible to have a combination of different media.

Each organization can have several archives. Each archive uses a database for managing
the index data and one or more locations for storing the documents and header files.

As mentioned already, you can build archives that span several sites by using a master-
satellite structure. The synchronization between master and satellite is handled by the
Workflow server. As far as the user is concerned, both master and satellite provide the same
functionality. You can add documents to either of these two archives and, if you have the
necessary authorization, also modify and delete documents. Each archive has an ID which is
unique in the world and which cannot be altered. This prevents any clashes between names
even if the systems are merged.

21
Content Server

Achieving a high degree of security was an important design criterion for the developers of
the DocuWare server. The following are among the crucial security aspects of Content
Server 1:
 Users and administrators require no knowledge of the internal file structure – nor do they
need access rights to it.
 Documents and files can be stored in an encrypted format (only in conjunction with
enterprise server).
 Files are protected with a type of checksum (using a hash algorithm) making any
changes immediately visible.
 When multiple users access the same document at the same time, the DocuWare system
ensures consistency.
Additional security aspects are described in the following section which discusses the main
elements of the Content server and the file cabinets.

5.1. File Cabinet


This section describes the way data and documents are stored in the DocuWare system.
Storage and access to the documents is managed by the Content Server; neither the
administrator nor the users need direct access to the documents, since they go via the
intermediary of the DocuWare software. The description is provided only to give you an
overview of the inner workings.

An archive ("file cabinet") is an organization-dependent unit characterized by what the user


specifies with regard to disk management and by the index files associated with the
documents.

Every organization has at least one or more logical "file cabinet(s)" for storing documents.
The archive settings define:
 General characteristics, such as name, etc.
 Database to be used, and any additional database-related settings
 The file cabinet to be used and its subdivision into logical disks (with capacity limits)
 Access rights (and file cabinet profiles) for the archive or for individual fields
 User dialogs for file storage, searches and results list
 Web instance(s) to which the file cabinet is available (for access via Web Client)
When setting up a file cabinet you also need to specify which Content Server is going to be
used to access that file cabinet. Other than that, the archive settings define the principal
functionalities. These include availability of a full-text index, type and extent of the stamps
that are available for document processing as well as electronic signatures.

Optionally, an archive can be accessed via several Content servers. Allocation takes place at
user login and is controlled by the Authentication server. This allows on the one hand load
distribution across several Content servers and on the other a "changeover" if a Content
server should fail.

1
In view of the importance of the security aspect, there is a separate White Paper which provides an
in-depth description of this topic.

22
Content Server

5.2. File Structure


Typically, documents that have been scanned in black and white are stored as a TIFF file for
each document page. Color scans are stored as JPEG or PNG files. DocuWare is also
capable of handling multipage TIFF files for import and export purposes. All other documents
that are read into DocuWare, such as PDF and Office files are stored in their original formats.
The file cabinets contain the documents themselves plus a header file and possibly
additional files for audio comments.

This means that for each document stored, DocuWare may need to manage a number of
files. A "document" as understood by DocuWare may consist of a combination of several
TIFF, Office, PDF and other files, for example in cases where DocuWare stores an e-mail
with multiple attachments as one document. In DocuWare such parts of documents are also
called "pages" (see 5.5 to 5.7). For each document that it stores, DocuWare creates a
separate document directory. The system manages documents by their header file (XML
format). Each document is assigned a unique sequential number, the so-called DOCID. This
is automatically incremented for each new document.

In order to achieve optimum flexibility and openness, the document store is mapped on to a
file directory from which external storage systems may be addressed. The range of options
available in these file directories is determined by the operating system. In view of the
intended open architecture and the independence with regard to the storage systems,
DocuWare models itself on the possibilities offered by these file systems:
 CD-ROM standards ISO 9660 and Joliet
 DVD standard
 Microsoft NTFS, FAT16, FAT32
 Linux file systems (ext2, Minix, NFS, etc.)
 Novell File System
These were taken as the framework conditions for the DocuWare file storage structure. Since
for reasons of compatibility and performance no more than 256 files should be stored in a
directory, you need to use several hierarchy levels.

By using four, DocuWare can manage more than 4 billion documents per file cabinet (2564 =
4,294,967,296). Below the file directory assigned by the administrator, the DocuWare
directory is addressed by its archive name, the disk numbers, three directory levels and the
document level.

23
Content Server

If for example you allocate the directory D:\DOCS and the name SALE to the file cabinet, the
documents of the first disk will reside in the following subdirectory:

Directory for Header for


Disc 1 Document 1 document 1

D:\ DOCS\ SALE.000001 \000\000\000 \00000001\ 00000001.XML

Apart from the header file, the document directory will also contain the files associated with
the stored document, all beginning with "F" (= File) plus a sequential number. Sound
annotations (spoken text, etc.) are identified by the letter "A" (= Annotations), the number of
the associated F file and a sequential number.

A document that consists of several parts and contains speech annotations would therefore
be represented like this:

\00000001\ 00000001.XML
\ F1.pdf
\ F2.doc
\ F3.tif
\ A1_1.wav
\ A1_2.wav

5.3. The "Disk" Concept


The documents of a file cabinet are stored in so-called DocuWare disks. DocuWare disks are
directories in the file cabinet identified by a name that DocuWare has assigned them. The
subdivision of the file cabinet into logical disks is a means of organizing the storage media.

You can transfer these logical disks to another – physical – medium at any time you choose,
for example when they reach a certain size. This has the advantage that documents can be
swapped out to physical media either by pre-defined rules, or automatically. DocuWare
provides a number of convenient support functions which automate the necessary steps.

The concept of logical disks and the open file structure gives the administrator a high degree
of transparency and flexibility when working with stored files. Since the structure conforms to
common standards you may also use the tools provided by the operating system, though
these are less convenient.

24
Content Server

5.4. Supported File Storage Media


Different storage media may be required depending on the document volume and the access
and storage requirements. In addition, security aspects play a very important role in the
design of storage systems. Thanks to its standard-based architecture, DocuWare supports a
wide spectrum of options:

local hard disks, (virtual) network storage media and external storage systems. The
technological basis of these systems is irrelevant, since DocuWare is capable of supporting
any media, provided they conform to the conventions for Windows filing systems. This
means that advanced storage technologies such as RAID systems, NetApp storage
solutions, Network Attached Storage (NAS), other "shared disk" systems or Storage Area
Networks (SAN) can be used, as long as they can be integrated in the Windows file system
as virtual disks.

In addition, DocuWare offers direct support for certain jukeboxes and special storage
systems by providing software that integrates these systems as DocuWare file cabinets just
as it does with Windows file systems.

You can set specific options to determine whether files will be written direct to the target
medium, which in the case of WORM for example will ensure maximum security, or whether
to go via the intermediary of the virtual disk, because CD/DVDs cannot be burnt in
succession.

The following sections describe the different media and their application.

5.4.1. Hard disks, RAID

Each GB of mass storage can contain some 20,000 DIN A4 pages. This is the equivalent of
about 40 well filled paper folders. In addition, you have the option of combining several hard
disks in a so-called Disk Array. These arrays are the ideal solution for storage capacities of
up to 150 GB for an archiving system where magnetic storage technology does not present a
problem.

A RAID (Redundant Array of Independent Disks) provides increased security against data
loss in the event of a hard disk failure. Depending on the RAID level, it also allows removal of
the disk "on the fly."

5.4.2. Optical removable disks

An optical removable disk (CD, DVD, Blu-Ray, WORM) can store up to 50 Gigabytes of data
- the equivalent of 800,000 pages of text. Using such large drives without a jukebox makes
sense only for single workstations. Their advantage is that they can be expanded indefinitely,
simply by inserting more disks. DocuWare looks after the management and numbering of the
disks. As long as you leave the disk labeling to DocuWare, retrieval of documents is very
easy, even if you work with many different disks.

For a long time, optical removable disks were considered to be revision-proof in comparison
to magnetic storage media which is why they were the medium of choice, even if magnetic
disks would have been possible. However, DocuWare ensures that modifications are either
not possible or that they are immediately obvious.

25
Content Server

5.4.3. Jukeboxes

Jukeboxes are "disk-changing robots" that handle optical media, typically containing one to
four drives. Currently, jukeboxes provide the largest storage volume with online access.
Small-volume solutions store 10 GB, high-end systems up to several thousand GB. These
systems are clearly useful for networks that handle huge amounts of data. Access speed is
dependent on the number of inbuilt drives. When frequent disk changes occur, access time
can be several seconds per image file.

Apart from disk-based jukeboxes you can now also have advanced tape systems (tape
libraries), such as WORM tapes which are a cost-effective way of providing large storage
capacities. You need to ascertain that the system can be integrated into the Windows file
system. There are quite a few now that have DocuWare certification.

Special access software will integrate jukeboxes transparently into the Windows file system
so that they can then be used by DocuWare. A list of storage systems that are supported
directly by DocuWare and some of which are certified is published on www.docuware.com.

5.4.4. Content Addressed Storage (CAS)

Until recently, whenever revision-proof archiving was a major concern, optical media were
used. Now, however, RAID-based solutions have become a perfectly good alternative,
especially with large volumes, as they can be made to behave in a similar manner to WORM
drives by using a special software. These are closed systems that typically have the following
characteristics:
 Application and users have no knowledge about the physical location of a file within the
subsystem. Accidental or intentional modification of the data by users/administrators is
not possible.
 "Hashing" similar to the signature procedure is used to give the file a "fingerprint" – which
also serves as its address.
 Identical copies are saved only once.
 The file is automatically given a time signature.
 Storage and access is possible from different systems on different platforms, i.e.
documents that were stored with DocuWare may be read by applications on other
platforms.
 It is possible to increase capacity on the fly. Data is automatically distributed, which also
implies that it is possible to migrate to other media within the same system.
 The system provides redundancy, error monitoring and – wherever possible –
autonomous error correction.
In order to utilize these functions the application needs to address a specific interface of the
CAS system. A list of the CAS that are directly supported by DocuWare can be found at
www.docuware.com.

As already mentioned, DocuWare partially implements CAS functionality at application level,


regardless of storage system. CAS systems are therefore to be recommended in cases
where the requirements for capacity, performance and security are particularly high.

5.4.5. NetApp Storage

The NetApp storage solutions are based on one of NetApp's own operating systems and can
be integrated in various storage area networks (NAS, SAN, iSCSI). They are especially
intended to manage large volumes of data and for the long-term archiving of WORM

26
Content Server

documents. The company provides special software for data management. This supports the
following tasks:
 Management of SANs
 Performance optimization
 Application integration (e.g. with VMware, SAP, Oracle, Windows, Exchange,
SharePoint)
 Data backup and restore
 Archiving
 Ensuring compliance with statutory retention periods
Together with DocuWare, NetApp Storage is only available for the storage of documents and
requires an enterprise license from DocuWare.

5.5. Header File


All documents managed by DocuWare have a header file containing not just meta and index
data, but also annotations, stamps, signatures, etc. Index data is written both to the database
and to the header file. This duplication ensures maximum security. This means that even in
case of a total failure of the database without a backup the documents and their index data
will still be available.

Header files are XML files. Using these standard file formats gives customers the following
benefits:
 Less dependency on the manufacturer because the internal structures are open.
 Maximum transparency thanks to formats that can be both read and written.
 Simplified exchange with all standard-compatible systems, including future DocuWare
generations.
 Simplified exchange with capturing systems and scan service providers.
DocuWare uses this format for storing the metadata and any additions. The actual content is
stored separately (for performance reasons), except when exporting. DocuWare uses the
XML file not just for NCI but for all documents that are managed by the DocuWare system.
For each file that is part of a DocuWare document the XML file contains a separate section
which may contain metadata.

27
Content Server

The information essentially is:


 Document description
Information relating to the whole document, such as signatures and encryption
 Document metadata
All descriptive data (index data) for the document which is required either from the user
or the system perspective, including DOCID, disk number, etc.
 Page Rendition Content Description
Page-specific information, such as text or speech annotations, levels, redlining, etc.
To allow the interchange between DocuWare systems at different sites an area can be
reserved for direct integration of data. The figure below illustrates the structure of the XML
header file.

Figure 11: Structure of the XML header (simplified)

5.6. Metadata
The metadata contain both the attributes allocated by the user (index data, field properties)
and the data that DocuWare requires for its management function (system properties), such
as the DOCID. This data is identical to the index data which the database maintains for every
file.

DocuWare ensure the integrity between the database and the header file. In the event that a
database is irretrievably lost (when no usable backup is available) the header files can be
used to regenerate the database information. However, since this procedure can be rather
time-intensive, it should not be used instead of a traditional data backup.

The storage properties contain information about the history and the logical archive of the
file. Application properties are information that is required for integration with other
applications, for example with SAP.

28
Content Server

5.7. Document
A DocuWare document can consist of several files of different formats (TIFF, Word, PDF,
etc.), which can in turn consist of several pages.

For example:
I. A 3-page paper document that was scanned into DocuWare consists of three
document pages, each of which is a one-page file (b/w TIFF files generated by
DocuWare).

II. For one document, a b/w TIFF file generated by DocuWare, a 3-page Word file, and a
2-page PDF file are linked together. The document then consists of three files:

1. File: b/w TIFF file with page 1


2. Document file: Word file with pages 1, 2 and 3
3. Document file: PDF file with pages 1 and 2

Annotations (multiple layers of redlining, text and speech annotations, etc.) can be made
within a document in each file, but only on the first page within a file.

As in Adobe PDF, the annotations with their characteristics and any additional attributes such
as user information are stored and then reproduced by the Viewer at runtime. No additional
image files are therefore necessary, and the annotations can be traced and modified in a
flexible manner.

29
Databases

6. Databases
For its operation, DocuWare requires a relational database, which it uses both for storing and
for performing searches within the structured index data of the documents and for the full-text
index. In addition, DocuWare stores all essential system information (such as Authentication
server data) in this database.

During installation, DocuWare optionally automatically sets up the integrated database,


unless the administrator explicitly deselects this option, for example if the intention is to use
existing systems.

DocuWare supports various database systems within a DocuWare system. However, the
administrator has the option of specifying a particular database to be used for each file
cabinet. It is also possible to switch to another database system at a later stage.

6.1. Database Structure


Searches in the documents stored in DocuWare are always performed via a database. For
this purpose, the index data is stored in its structured form (relationally) or in the form of a
full-text index.

The database not only manages the search criteria that are relevant for the user, but also the
system-internal information needed for storing and retrieving the documents in the file
cabinets.

The characteristic that uniquely defines a document is its DOCID - a number for a document
that may consist of various files and is unique within each file cabinet.

Of particular importance are the user-defined fields. These specify the keywords and
categories by which documents are stored and retrieved.

Thanks to separate keyword tables it is theoretically possible to have an unlimited number of


keywords for each document. Moreover, it is possible to create several keyword fields within
a file cabinet. The speed for searching in keyword fields is very high since the keyword
column in the table is indexed. As soon as the entry is found, the DOCID allows direct
access to the database entries of the associated documents.

30
Databases

Essentially, the database manages the following tables:


Table type Description Table name
System table Describes all managed file cabinets by their name, DWSYS
ID, current storage media (Disk ID).
Disk table Describes the disks in the system, i.e. all disks of all <File cabinet
file cabinets by their numbers and other capacities. name>_DISKS
File cabinet main table Describes the documents per file cabinet by <File cabinet
mandatory system fields, such as: name>
number of pages, disk number, storage date,
version number,
access log information,
synchronization information (satellites only)
and user-defined fields with field types
- Text
- Date/time
- Numerical
- Memo
Keyword tables For each keyword field in a file cabinet, a table is <File cabinet
created which links the keyword to the DOCID. name> <Name of
keyword field>
Locking table Describes the documents of a file cabinet that are <File cabinet
locked against modification – by date/time, user and name>_LOCK
computer on which the document is being edited,
as well as information about checkin/checkout
status.

6.2. Integrated Database


The MySQL database is the "integrated database" which comes as part of the standard
package. If you are using the integrated database, all the necessary parameters are set
automatically during installation. These settings provide the standard values for using any
other databases.

6.3. Direct Database Connection


The market-leading database systems (MS SQL, Oracle, MySQL) are directly connected to
the DocuWare system. For the user, this direct connection is no different from working with
other databases, except that access is not via the ODBC interface; instead, the database is
directly addressed with specific SQL commands. This results in a speed advantage.

6.4. Database Administration


Databases may reside on autonomous servers (outside the DocuWare server area).
DocuWare can work with several database connections simultaneously, and use different
servers and different databases. Whether or not several connections can be established to
one database depends on the particular database.

31
Web-Based Applications

7. Web-Based Applications
The trend in IT applications is increasingly toward Web-based solutions. Installation and
maintenance on client computers thereby become unnecessary, access to the application is
possible from anywhere and from all computers, irrespective of operating system. All that is
needed is an Internet connection.

DocuWare is also following this path. File cabinets are accessed through the Web Client.
From the user's perspective, documents are searched and shared as on the Windows Client;
technologically, however, it is a completely new development based on ASP.NET, JAVA
script, AJAX and Silverlight.

The administration of the DocuWare system is also becoming increasingly Web-based. In the
future, it should be possible to manage everything via an Internet connection.
Technologically, this will also be based on Silverlight.

7.1. Document access via Web Client

7.1.1. Web Client Server

File cabinet access via the Internet is based on Web Client Server, which is installed within
the DocuWare system as an additional server module. Web Client Server supplies the user
interface which is displayed in the browser window.
To access a file cabinet, the user connects to Web Client Server via the Internet using Web
Client. The latter forwards the request to Authentication Server to verify the user account and
the file cabinet access rights via Content Server. From the perspective of the Authentication
Server and Content Server, Web Client Server acts like a client.

Figure 12: File cabinet access via Web Client Server

7.1.2. Imaging Server

Imaging Server, another component for Web-based document access, converts archived
documents that are to be displayed in the Web Client Viewer to a graphics format. This
allows all main file formats to be displayed and printed in high quality without having to install

32
Web-Based Applications

anything on the client computer. Imaging Server is also responsible for converting files to
PDF and for the text search in the Web Client Viewer.

Web Client Server communicates directly with Imaging Server. More than one Imaging
Server can be installed within a DocuWare system, making it possible to distribute the load.

7.1.3. Thumbnail Server

In Web Client, documents can be displayed in the Viewer and in the basket as thumbnails.
For better performance, the thumbnails are not recreated each time they are loaded, but
saved in a dedicated database and supplied from there when needed for display. Thumbnail
Server is responsible for saving and retrieving thumbnails and is connected to both Web
Client Server and the database.

Figure 13: Web Client Server with Imaging Server and Thumbnail Server

7.1.4. Web instances

Any number of Web instances can be created for a Web Client Server. A unique URL is
assigned to each of these instances. The user connects to Web Client Server via this URL
and loads the corresponding instance in Web Client.

Which file cabinets and file cabinet dialogs are available and how the DocuWare system is
logged onto are defined separately for each instance.

7.1.5. Web Client

Web Client is the user interface for Web-based file cabinet access.
When the user calls up a URL for a Web instance in the browser, Web Client is displayed in
the browser window. All major features of the document management system can be run via
Web Client: opening documents, marking with annotations and stamps, editing index words,
storing and sending documents, etc.

DocuWare Web Client is based on ASP.Net and Ajax (Asynchronous JavaScript and XML).
These technologies allow Web Client to process searches very quickly, so users receive
immediate answers to their queries. Web Client is based on individual control elements
known as Web Parts.

33
Web-Based Applications

Web Client does not require any installation on the client computer and is not dependent on
the operating system. Only features that cannot be implemented using a browser alone
require applications to be installed on the client computer (see following section).

7.1.6. ClickOnce applications

Sending archived documents via the local mail client, another feature of DocuWare Web
Client, is technically not possible using only a browser. A DocuWare application, a "Smart
Client", must be installed on the local client computer.
DocuWare uses the ClickOnce technology from Microsoft for this. The first time they send
mail, the user clicks to download the DocuWare application once, and this is automatically
installed on the local client computer. No administrative rights are required on Windows. This
application can be updated automatically.

A local application is also required for the browser-based client application of the DocuWare
SmartConnect add-on module. This is also installed on the client computer using the
ClickOnce process.

ClickOnce applications require a Windows operating system.

7.1.7. Silverlight Plug-In for Web baskets

In DocuWare, documents are processed, e.g. stapled, unstapled and pre-indexed, in so-
called baskets. For Web Client, these baskets are generally not located on the local
computer but on the network. These baskets, also known as Web baskets, are managed by
Content Server. For the Web Client user to be able to use these Web baskets, a Silverlight
browser plug-in must be installed locally. A Silverlight browser plug-in requires a Windows or
Mac operating system.

7.1.8. Integration of Web Client in other applications

There are many integration options for Web Client. Web Client can either be integrated as a
whole into other applications or only individual elements of it, such as the result list or the
Viewer. The integration works with Windows and Web programs via special URL calls.

A full overview of the integration options for DocuWare Web Client can be found in the
"Integrations" White Paper.

34
Web-Based Applications

7.2. Web-Based Administration


The long-term goal is to make the administration of the DocuWare system as Web-based as
possible. This goal has already been achieved for some of the newer elements, such as
managing Web baskets and e-mail alerts, and for the administration of the DocuWare add-on
modules SmartConnect and CONNECT to MFP.

Technologically, this is based on Silverlight, i.e. the administrator requires a Silverlight


browser plug-in. This requires a Windows or Mac operating system.

35
Management Framework Process

8. Management Framework Process


One of the important advantages of DMS is the possibility of automating routine activities and
to support established processes. These can include system-related standard processes as
well as application and user-dependent ones.

The overall architecture is defined by DocuWare's Process Management Framework, which


defines the administrative and process handling operations. Three process categories can be
distinguished:
 Document Batch Process
No user intervention is required here. These processes handle routine sequences, e.g.
import, storage, export, migration and deletion of documents and data.

 Document workflow
Automatic dispatch, including user interaction, of documents along pre-defined paths, is
one of the most common workflow applications. Invoices, purchase requests, vacation
applications – these are just some of the documents that need to be created, approved
and posted in large organizations. All these processes can be controlled and tracked by
means of the document workflow.

 Data exchange with third-party applications (Data Acquisition and Distribution,


EAI)
A document management system must be able to take data and documents from a
variety of systems and may need to return them to such systems on request. These
systems are therefore a form of "Enterprise Application Integration (EAI)", since they
provide a general infrastructure for many applications and users. These processes are
currently implemented by the LINK, AUTOINDEX and ACTIVE IMPORT modules.

8.1. Workflow Server


Workflow Server in DocuWare is a separate server module which controls (sub-)processes
that can be automated. It provides the various functionalities for automating steps and acts
as the central element for these tasks, including their administration.

Workflow Server is the central workflow engine for performing pre-defined workflows.
Workflows have the following characteristics:
 Triggering event
 Input data
 Various logically separate procedural steps
 Output data
The Workflow Server works to match this model. Events may be triggered by user actions,
they may be timed, or they can be triggered on reaching a particular condition (e.g. "disk
full").

Such an event then starts off a particular workflow, which – depending on instructions – may
first read in certain input data. Input can come via interaction or by reading a file from a
particular directory, or from data extracted from a database.

The process itself consists of several steps, each of which represents a transaction. If a step
cannot be completed successfully, the Workflow server issues a notification to a (log) file,
whereupon a reset to the last valid state takes place.

36
Management Framework Process

On successfully completing a step, the (intermediate) result is handed to the next procedural
step. The final output is sent to the user, to a directory, or to the DocuWare file cabinet.

An intermediate result of a workflow task can trigger new events which in turn may initiate
new workflows. Several workflows can resolve the same tasks in parallel and for this purpose
share the same resources, such as directories, file cabinets, etc. – while the Workflow server
ensures the integrity of the data. The processing status is monitored and each workflow task
is visible.

More than one Workflow Server can be installed within a DocuWare system. A specific
Workflow Server is then allocated individual workflows. This means that the load can be
distributed among the Workflow Servers.

8.2. Pre-defined Batch Processes


DocuWare uses the described functionality of the Workflow server also for system-oriented
standard workflows, for example for controlling the various processes that use the document
stack.

Pre-defined processes are implemented during the initial DocuWare installation, but also
during the (subsequent) installation of additional expansion modules. Users with the
necessary authorization can modify the pre-defined processes to suit their own needs.
Typically, this is a task that falls to the organization administrator.

Pre-defined workflows that are controlled by the Workflow Server exist for the following
tasks:
 Migration
 Exporting archives and sub-archives
 Generating and synchronizing satellite archives
 Creating independent CD/DVD file cabinets
 Adding index information from external data sources (AUTOINDEX)
 Index Restores
 Deleting documents that are defined via filters
 Generating and/or updating the fulltext catalog
 Importing of documents from spool files (COLD/READ)

37
Full-Text Index

9. Full-Text Index
9.1. Functional Principle
The DocuWare functionality has a full-text index, which is available, but not mandatory, to
users. The full-text service uses the same database as the Content server, but creates its
own tables. Access to the archive database and the documents is direct when generating a
full-text index, i.e. without the intervention of the Content server.

The full-text search function is completely integrated in the client functionality, both for
Windows client and for Web client. This means that no special databases are required and
there is no need for users to familiarize themselves with different search clients. When
configuring file cabinets, users must simply decide whether or not to create a full-text index
for the documents. Full-text searching is carried out via the Content server.

The main benefits of this full-text architecture are:


 It works with all databases supported by DocuWare
 There are no special requirements for use on Web Client
 There are no restrictions for use on Web Client
 "Wildcards" (?, *) may be placed at the beginning of a search string
Since indexing large document volumes can require considerable computer resources, full-
text indexing in the DocuWare architecture is carried out as an autonomous workflow on the
Workflow Server which is being executed in the background, independently of other
transactions within DocuWare. In most cases, there is no need to have a full-text index
immediately after a document has been added. This means that indexing can be done at
times when the system is not busy, for example during the night.

A full-text index can be generated for each logical file cabinet. Which documents are included
is determined by their association to a particular file cabinet. Since a DocuWare system can
contain a great many archives, you may end up with a large number of full-text indexes too.

In view of the fact that the general fluctuations of documents within the archive are managed
by the Content server, communication between the latter and the full-text workflow is
necessary, even though both are independent of each other. This happens "indirectly" via the
full-text main table, whereby the Content server marks documents and files to be indexed –
and those that need to be deleted. The full-text workflow then makes any necessary
modifications and updates the status fields.

Each occurrence of a search string also comes with an evaluation of the probable relevancy
of the term. The result list of a full-text search is sorted according to this relevancy (or
irrelevancy = noise).

To prevent the full-text index from being loaded with irrelevant words such as articles,
pronouns, etc., the full-text process contains a stop-word list which acts as an automatic
filter. The administrator can modify this stop-word list, for example by excluding certain terms
that occur frequently within a company but have no interest for search purposes. The name
DocuWare for example is not a useful differentiator within the DocuWare company. It is also
possible to exclude files (for example image files) by specifying their suffix.

In order to achieve a powerful search for partial strings and to be able to precede a search
term with a wildcard, a special algorithm – the so-called "Multi Suffix Tree" (MST) is used.
This works with two special files that initially identify the correct entry in the dictionary table.
This then provides all other important information (relevancy, position, etc.).

38
Full-Text Index

The actual full-text index is implemented via the MST and the stringlist files which are stored
for each archive within the filing system. The individual words and substrings are stored as a
tree structure in the MST file. The stringlist file is a list of IDs which links all words and
substrings with entries in the dictionary table.

9.2. Full-Text Tables and Files


These tables are needed for each archive that will contain full-text information. This section
describes the tables which are required for the full-text search as well as the index files that
DocuWare generates.
Table type Description
Full-text main table Contains information about the last indexing process for each file in a
document. This table is updated by the Content server and serves as a task
list for the full-text workflow.
Dictionary table This table stores an instance of each string that was extracted from a
document. At the same time, a counting mechanism counts how often the
word occurs in the file cabinet and in how many documents, and it
evaluates its NOISE value. The NOISE value indicates the probable
relevancy of the word.
Index table The index table shows which string occurs in which document, how many
times, and on what page(s). This allows a word to be associated with a
document.
Ranking info table DocuWare uses this table to sort the search results by relevancy. It also
takes into consideration the above-mentioned NOISE value.
MST file Tree structure of words and substrings
Stringlist file Word list with entry points to the dictionary table

39
Distributed and Redundant Archives

10.Distributed and Redundant Archives


Modern operating and network systems make it easy to use DocuWare file cabinets across
different sites. This applies both to the access of remote clients to DocuWare servers and to
the communication between the servers among themselves. With this in mind, DocuWare
has developed the "satellite archive" model.

Moreover, it is often desirable to export (sub-) archives, for example in order to deliver
information to mobile users outside the enterprise structure. This can be achieved by so-
called "autonomous archives."

Thanks to today's advanced security technologies such as VPNs, firewalls, etc., misuse can
largely be prevented. In this chapter, we restrict ourselves to a discussion of the functions
that DocuWare provides for distributed and redundant archives.

10.1. Satellite Archives


As mentioned under System architecture (see 3.3) and Workflow Server (see 8.1) installation
can span a number of different sites. Conversely, you may also decide to house a satellite
archive within a totally different DocuWare installation. In such cases there needs to be
regular synchronization between the sites, in order to keep both sides up to date.

Satellite archives have architectures with the following characteristics:


 There may be many satellite archives for one master.
 A satellite archive can in itself be the master for other satellite archives, but each one
only ever has one master.
Regardless of the file cabinet type ("master" or "satellite"), the full DocuWare functionality
can be used at both sites, including copying any documents. If a document was modified on
both sides between two synchronizations, the rules set up in the pre-defined workflow are
applied. These specify exactly how to proceed with deleted, modified and newly created
documents on both sides. If modifications have been made to the document and/or the index
entry, the following rules may apply:
 Master overwrites satellite
 Satellite overwrites master
 Last modification overrides any others
 No action, but add to log file
In the last case, a manual intervention is then possible.

Synchronization can be time-driven or workflows can be started manually.

10.2. Mobile Users


Apart from implementing archives spanning several sites, satellite archives are intended
mainly for mobile users. As with the groupware clients of leading developers, these archives
provide convenient functionalities regardless of the current online and offline status.

This means that documents cannot only be read offline but also edited. New documents can
be added to the archive, and certain tasks, such as releases, can be effected via workflow
control.

40
Distributed and Redundant Archives

Synchronization with the master can be time-triggered or can be initiated manually by the
user. Since modifications usually occur in sub-areas of the archive only, the synchronization
areas can be specified by the powerful filter functions. The user-specific restriction of the
synchronization process to individual archives and sub-areas of archives is particularly
important for minimizing both storage requirements on mobile PCs and data transfer volumes
for regular synchronization.

Seen from a technical perspective, a mobile user is a single-user installation where a


complete DocuWare system, including Authentication server, Content server and possibly
Workflow server are all installed on one computer – typically a Notebook.

10.3. Autonomous File Cabinets


Autonomous archives make it possible to copy a (sub)archive to an external mass storage
medium so that its contents can be searched independently of the normal infrastructure on a
different system. Here, the system architecture does not correspond to the single-user
installation mentioned before, but to an export of one or more (sub-)file cabinets, enhanced
by additional computational features.

In order to work autonomously, these installations have their own local database. All
necessary components for working with the archive are stored with the data and documents
on one medium, e.g. a CD or DVD. The target system does not require any software to be
installed. However, you may install extra software if you wish to increase the speed.

Such an archive can be used in a flexible way on the most diverse computer systems, e.g.
Notebooks, without necessitating a connection to the rest of the IT infrastructure. The
capacity of the archive depends solely on the medium's capacity, minus the search software.

Typical applications:
 Transferring legacy data
 Creating backup copies of sub-archives
 Interaction with external partners, e.g. service providers or subcontractors
 Publishing and distributing catalogs, parts lists and drawings
 Providing norms and technical documentation, e.g. for development, quality assurance,
purchasing and distribution
If no modifications are to be made or none are allowed, it makes sense to use archives for
pure search functions on a Notebook – without synchronization.

41
Integration

11. Integration
Archive systems are typically integrated in an existing IT environment. The challenge
therefore is not just to ensure consistency but also to optimize the interchange of data and
documents with other systems without having to invest in complex and highly redundant
administrative expenditure.

DocuWare solves this problem by working with several servers and providing the appropriate
interfaces as well as by adhering to common standards. User data that are maintained in
Active Directories or in LDAP directories can be transferred to DocuWare without any
problems. This of course includes synchronization of changes on the fly.

Moreover, any storage technology can be used, provided it can be mapped as a Windows
file directory. This is the case with all systems by leading manufacturers and means that
DocuWare archives can be set up with non-Microsoft system platforms (such as Linux,
Novell, Solaris).

Integrating third-party platforms is equally an option for database servers, mail systems, Web
servers and applications for which interfaces are available, for example SAP.

Figure 14: Integration capability

In view of the importance of the integration aspect, there is also a White Paper on this.

The following diagram gives an overview of how DocuWare can be set up to work with third-
party applications. This is also described in detail in the "Integrations" White Paper.

42
Integration

Figure 15: DocuWare architecture with interfaces for third-party applications

43
Scalability

12.Scalability
DocuWare systems are highly scalable, starting from single workstations up to enterprise-
wide systems that can span several sites, accommodate thousands of users and are
distributed across several servers.

Figure 16: Scalability; DocuWare Client subsumes Windows Client and Web Client

DocuWare installations can be installed as standalone systems on a single computer, which


then houses the whole range of modules, such as Authentication Server, Content Server,
Workflow Server, a database server and the associated client The architecture and
functionalities are essentially the same as in large-scale installations.

However, the most frequent type of installation is a multi-user system within a local network.
The performance of the described system architecture comes into its own when the system
is fully exploited, because functionality can then be distributed across several servers, each
configured to work optimally according to organizational, technical and performance criteria.

TCP/IP networks are required for this – which today provide wide area coverage. The
DocuWare servers require MS Windows platforms, although these can work with other
platforms – see the description under Integration.

12.1. Clustering and load distribution


All access to documents for storing or reproduction purposes occurs via the Content server.
A Content server can be responsible for several archives.

In the case of large-scale installations and intensive system utilization, the Content server
can therefore become a bottleneck. In such cases, the load must be distributed across

44
Scalability

several Content servers. If a Content server fails, restarting the client causes the
Authentication server to allocate a new Content server (CTS).

Figure 17: Load distribution across several Content servers (CTS)

In addition, load distribution can be done by the platform variants of the system
manufacturers, e.g. the Microsoft cluster solution. Thanks to the modular structure and the N-
tier architecture, the options provided by that solution can be used optimally, since the
system can allocate resources according to requirements.

For details about the fail-safe operation of the DocuWare system see our White Paper on
Security.

12.2. Other performance measures


DocuWare clients use "caching" by default. This means that the requested documents are
temporarily saved in a local file cabinet, since users typically access the same documents
over a certain period.

The organization administrator can define appropriate capacities when setting up the client.
When the maximum capacity has been reached, part of the cache is emptied to make room
for new documents. Optionally, the cache may be emptied when the user session is closed.

In addition, you can specify that the cache should only ever contain current data, i.e. that
data over a certain age is automatically deleted.

Integration with other IT systems, redundant archives, installation of several instances of


server components, distribution to several hardware systems, etc., are all options for
matching performance and availability of the DocuWare system to the requirements. Hence,
the architecture provides a great deal of flexibility for setting up a configuration that is optimal
both from a technical and an economic point of view.

45
Glossary

13.Glossary

Administrative Rights Administrative rights are the rights for modifying archive definitions and
definitions within an Organization.
File Cabinet A file cabinet in DocuWare is a logical unit for receiving, storing, searching
and retrieving documents. A file cabinet always comprises the actual
storage location where the documents are physically held, with their
associated database tables, index data and other descriptive or
complementary elements belonging to a document. Optionally, a file cabinet
may contain a full-text index which makes the documents accessible via full-
text information.
A range of storage media types are supported. "Logical disks" are allocated
to the file cabinets which are mapped to the physical storage media
according to certain rules. A file cabinet is a collection of indexed
documents. Precisely coordinated access and administrative rights can be
assigned to file cabinets.
File cabinet administrator User who has administrator privileges for a file cabinet. This right is not
transferable.
Owner User who can create and manage a file cabinet. File cabinet owners
manage the file cabinet structure and allocate the access rights to it. The
administration right is transferable, i.e. the owner may delegate the tasks.
File cabinet profile The archive profile is the set of all access rights to an archive. Among
others this includes the access rights to index fields or documents that may
also be dependent on certain index entries (field-dependent rights). A file
cabinet profile can also include administrative rights within a file cabinet. An
archive profile is defined within an archive.
User In the context of this White Paper, a user is always a DocuWare user. Users
can be combined into groups. Users obtain rights by means of individual
rights, profiles or roles.
COLD COLD is the only proprietary file format in DocuWare. It is an ANSI format
and reads in the text spool data with the DocuWare COLD/READ
instruction.
DocuWare Client DocuWare Client is a generic term for Windows Client and Web Client.
The Windows Client is installed on a Windows computer and runs there as a
native application. Together with the DocuWare servers, it constitutes a
working installation. A DocuWare system always requires at least one
Windows Client.
Using DocuWare Web Client, you can access DocuWare file cabinets via
the Internet. An installation on the client computer is not required. Web
Client Server must be installed in the DocuWare system.
DocuWare Servers DocuWare servers is a generic term and covers all server modules such as
Authentication Server, Content Server, Workflow Server, Imaging Server
and Web Client Server.
DocuWare System The DocuWare system comprises a full DocuWare installation with all
necessary and optional components. A DocuWare system is characterized
by shared hardware and system settings for one or more "organizations".
Occasionally the term "DocuWare" is used to refer to the DocuWare system.

46
Glossary

Document A "document" is a term referring to all objects stored in the file cabinet which
from the user's perspective form a logical unit – i.e. a document. A
document may consist of any number of files. These may be scanned data
in TIFF or multi-TIF format. However, files from output management
systems, Office or graphics applications or even binary files are also
handled.

A file can represent one or more page(s), but it may equally contain stamps,
signatures, annotations or other, similar information associated with the
document. Documents may also be files with content in different formats.
They may be an Office file together with an email file and several TIFF files.
A unique identification is provided by the DOCID.

Field-dependent rights Field-dependent rights define rights, which depend on certain index field
entries.
Function profile A function profile contains the access rights to features of the DocuWare
client. These include the access rights to menu functions and stamps.
Function profiles are defined at organization level. A function profile can
also include administrative rights at organization level.
Group Independent of roles, users can be combined into groups to which roles can
be assigned. A group is therefore a collection of users. The only way to
assign rights to a group is via roles. Groups facilitate the administration of
large numbers of users.
Header DocuWare uses this XML format for storing the metadata (index data) and
any additions (annotations, stamps, etc.). The actual content is stored
separately (for performance reasons), except when exporting.

This information is assembled in the "XML header file." Each document


stored in DocuWare has a header file which is stored together with the
document ("content") in the file cabinet.

Index data See Header


JPEG Joint Photographic Experts Group. Specification for compressing color
images with a certain loss of quality. Loss of quality means that certain
image information is irretrievably lost. JPEG is used to compress images
with a large color space (great bit-depth).
Menu function A menu function is a function within a DocuWare client. This includes
scanning and displaying or editing of documents.
Meta data See Header
Organization An organization in the sense it is used here refers to the management of
users and the file cabinets. No hardware administration is performed within
the organization. All system administration takes place at system level.
Organization Administrator As the name suggests, the organization administrator manages an
organization. A DocuWare system may contain one or more organizations.
The organization administrator manages in particular the rights and users
belonging to an organization. He/she does not have access rights to
archives and their administration.
PNG Acronym for "Portable Network Graphic" format. The format that was
developed and established as a standard by the World Wide Web
Consortium (W3C) is license-free and is expected to replace GIF and JPEG
image compression – without serious quality impairment.
Profiles Profiles are a collection of individual rights. They are divided into file cabinet
profiles and feature profiles. They can contain either administrative rights or
access rights to a file cabinet.

47
Glossary

Rights Rights allow the execution of particular functionalities within the DocuWare
system. Individual rights can be allocated in the file cabinets and at
organization level.
Role Within enterprise organizations, users are assigned different roles according
to their place in the hierarchy (e.g. approval of vacation requests) and on
their job description (e.g. purchaser). These roles can be mapped in
DocuWare in order to simplify installation and administration. This is
achieved by combining features and access rights into profiles which in turn
are allocated to roles.

The DocuWare system also makes use of the role concept: certain roles
with their associated profiles are predefined in order to handle
administrative tasks.
A role is a collection of profiles. Roles cannot contain individual rights.
Predefined roles facilitate the allocation of administrative rights.
System See DocuWare system.
System administrator The system administrator manages the system, particularly as far as
hardware is concerned. This includes the administration of database
connections, administration of communication paths, and document storage
paths. The system administrator has no access rights to organizational
information. In particular he/she cannot interfere with user administration.
TIFF Tagged Image File Format: The most important format in DocuWare is black
and white (1 bit) TIFF, compressed according to CCITT Group 4. This
format has become the established standard for electronic archiving of
scanned documents. For the purposes of archiving, DocuWare generates a
file for every page of a document.

Predefined roles Predefined roles are supplied with the DocuWare system; they guarantee
that the system works immediately after it has been installed. Pre-defined
roles are: system administrator, organization administrator and file cabinet
owner.
Workflow A workflow is a predefined sequence of steps which DocuWare performs
automatically when a predefined event occurs.
Workflow Server The Workflow server is the module that executes the workflows at runtime.
XML See Header
Access rights Access rights comprise file cabinets or menu features within the DocuWare
client.

48

You might also like