Menu

Tree [r11] / branches / dasho / DOM /
 History

HTTPS access


File Date Author Commit
 dtdparse 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...
 expat 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...
 include 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...
 interface 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...
 sample 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...
 src 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...
 GPL 2001-06-07 dashohoxha [r4] no message
 LICENSE 2001-06-07 dashohoxha [r4] no message
 Makefile 2001-06-07 dashohoxha [r4] no message
 README 2001-06-07 --none-- [r6] This commit was manufactured by cvs2svn to crea...

Read Me

            Implementation of DOM (Core) Interface in C++
            ---------------------------------------------


This DOM implementation is based on the XML parser "expat" written by
James Clark. "expat" is an event callback XML parser.


Contents
--------

The directory "interface" is the DOM interface, written in C++. It is 
what may be called a "C++ Language Binding" of DOM interface, and
it is a translation of "Java Language Binding" from Java to C++. It can
be used by the user of DOM to see what functions are available, how they
must be called, what they return etc.

The directory "include" contains the actual header files that are used
for the compilation of the C++DOM and that must be included by a program
that uses it. They are basically the same files as those in the directory
"interface", but they have additional things that are required for the 
implementation. There are also some additional files that are used for
representing the structure of DTD, which are not mentioned in DOM 
specification and are specific to our implementation.

The directory "src" contains the implementation of the objects of the DOM
interface. It also contains some files related with parsing of xml and dtd
files.

The directory "lib" contains "DOM.a" and "expat.a", two archive files that
contain the objects neccessary for linking.

"sample" contains a sample program that shows how to use C++DOM. It reads
from the command line the names of an XML file and a DTD file, builds in
memory a DOM representation for this document, then prints it out by 
traversing the structure.

"dtdparse" contains a parser for DTD that is build using "flex" and "yacc".
The files "lex.yy.c" and "y.tab.c" from this directory are copied to the
directory "src", where they are included in another file.


Recompilation of C++DOM
-----------------------

First, go to directory "expat" and type "make". This creates an archive file,
"expat.a" that contains the object code neccessary for linking and will put
it into the directory "lib". 

Then, type "make" in the directory "DOM". This will create the object file 
for each class and will archive them into "DOM.a" in the directory "lib". 

If dtd parser is also recompiled, (although there is no need to do this, 
unless you modify it) then the files "lex.yy.c" and "y.tab.c" must be copied 
to "src" and <#include "lex.yy.c"> inside "y.tab.c" must be changed to 
<#include "src/lex.yy.c">.


Using C++DOM
------------

Include "DOM.h" from the directory "include" and link with "DOM.a"
and "expat.a" from the directory "lib". See also how it is used in the 
sample file.


Extentions to DOM Specification wrt to DTD
-------------------------------------------

In order to represent the DTD with objects in memory, we have used some
additional objects that are not defined by DOM specification. These are:

class ElementType : public Element;
class AttrType    : public Attr;
class ContentType : public Node;
class PCDATA      : public Node;

Also, the DocumentType object keeps a hash (namedNodeMap) of all ElementType-s
and a hash of all AttrType-s. An ElementType has a pointer to a ContentType
object (what an element of this type can contain) and a multiplicity (*, ?, +)
of itself. An AttrType contains a pointer to the ElementType to which it 
belongs, its type, its default value etc. A PCDATA object serves to 
represent "#PCDATA".

A ContentType object represents in memory something like this (in DTD):
  (  ,  , ... )*  
(comma may also be a '|' pipestem, and '*' can also be '+', '?' or nothing)
It has a list of children, where each child can be another ContentType object,
an ElementType object, or "#PCDATA". It has a fiels that keeps whether the 
children are separated by ',' or by '|', and another field that keeps the 
multiplicity.


What is not implemented
-----------------------

Not everything in the DOM specification has been implemented (because of
lack of time, because we didn't need them, because we didn't understand
them very well, or for some other reason). Entities, notations, 
entity references have been left out almost totally. Some things that are 
mentioned in DOM specification, like readonly nodes, some exceptions,
normalization etc. have noot been implemented.

Although the DTD of a document is represented in memory, the implementation
doesn't do yet validation. We think that this can be done by patern matching 
of the DTD tree and the XML tree (by traversing them at the same time).

Another thing that we think is necessary for validation (but is not implemented)
is that each DOM object should have a pointer to the type that it belongs to,
e.g. each Element should have a pointer to the ElementType that it belongs to,
each Attr to its AttrType etc. This would enable the implementor to check
easily whether the insertion of a new node violates the rules of DTD, etc,
making easy the validation of editting of an XML document.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.