A C++ Implementation of DOM Core Code
Status: Alpha
Brought to you by:
dashohoxha
File | Date | Author | Commit |
---|---|---|---|
dtdparse | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
expat | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
include | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
interface | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
sample | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
src | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
GPL | 2001-06-07 | dashohoxha | [r4] no message |
LICENSE | 2001-06-07 | dashohoxha | [r4] no message |
Makefile | 2001-06-07 | dashohoxha | [r4] no message |
README | 2001-06-07 | --none-- | [r6] This commit was manufactured by cvs2svn to crea... |
Implementation of DOM (Core) Interface in C++ --------------------------------------------- This DOM implementation is based on the XML parser "expat" written by James Clark. "expat" is an event callback XML parser. Contents -------- The directory "interface" is the DOM interface, written in C++. It is what may be called a "C++ Language Binding" of DOM interface, and it is a translation of "Java Language Binding" from Java to C++. It can be used by the user of DOM to see what functions are available, how they must be called, what they return etc. The directory "include" contains the actual header files that are used for the compilation of the C++DOM and that must be included by a program that uses it. They are basically the same files as those in the directory "interface", but they have additional things that are required for the implementation. There are also some additional files that are used for representing the structure of DTD, which are not mentioned in DOM specification and are specific to our implementation. The directory "src" contains the implementation of the objects of the DOM interface. It also contains some files related with parsing of xml and dtd files. The directory "lib" contains "DOM.a" and "expat.a", two archive files that contain the objects neccessary for linking. "sample" contains a sample program that shows how to use C++DOM. It reads from the command line the names of an XML file and a DTD file, builds in memory a DOM representation for this document, then prints it out by traversing the structure. "dtdparse" contains a parser for DTD that is build using "flex" and "yacc". The files "lex.yy.c" and "y.tab.c" from this directory are copied to the directory "src", where they are included in another file. Recompilation of C++DOM ----------------------- First, go to directory "expat" and type "make". This creates an archive file, "expat.a" that contains the object code neccessary for linking and will put it into the directory "lib". Then, type "make" in the directory "DOM". This will create the object file for each class and will archive them into "DOM.a" in the directory "lib". If dtd parser is also recompiled, (although there is no need to do this, unless you modify it) then the files "lex.yy.c" and "y.tab.c" must be copied to "src" and <#include "lex.yy.c"> inside "y.tab.c" must be changed to <#include "src/lex.yy.c">. Using C++DOM ------------ Include "DOM.h" from the directory "include" and link with "DOM.a" and "expat.a" from the directory "lib". See also how it is used in the sample file. Extentions to DOM Specification wrt to DTD ------------------------------------------- In order to represent the DTD with objects in memory, we have used some additional objects that are not defined by DOM specification. These are: class ElementType : public Element; class AttrType : public Attr; class ContentType : public Node; class PCDATA : public Node; Also, the DocumentType object keeps a hash (namedNodeMap) of all ElementType-s and a hash of all AttrType-s. An ElementType has a pointer to a ContentType object (what an element of this type can contain) and a multiplicity (*, ?, +) of itself. An AttrType contains a pointer to the ElementType to which it belongs, its type, its default value etc. A PCDATA object serves to represent "#PCDATA". A ContentType object represents in memory something like this (in DTD): ( , , ... )* (comma may also be a '|' pipestem, and '*' can also be '+', '?' or nothing) It has a list of children, where each child can be another ContentType object, an ElementType object, or "#PCDATA". It has a fiels that keeps whether the children are separated by ',' or by '|', and another field that keeps the multiplicity. What is not implemented ----------------------- Not everything in the DOM specification has been implemented (because of lack of time, because we didn't need them, because we didn't understand them very well, or for some other reason). Entities, notations, entity references have been left out almost totally. Some things that are mentioned in DOM specification, like readonly nodes, some exceptions, normalization etc. have noot been implemented. Although the DTD of a document is represented in memory, the implementation doesn't do yet validation. We think that this can be done by patern matching of the DTD tree and the XML tree (by traversing them at the same time). Another thing that we think is necessary for validation (but is not implemented) is that each DOM object should have a pointer to the type that it belongs to, e.g. each Element should have a pointer to the ElementType that it belongs to, each Attr to its AttrType etc. This would enable the implementor to check easily whether the insertion of a new node violates the rules of DTD, etc, making easy the validation of editting of an XML document.