XML Introduction
XML Introduction
XML Introduction
In this tutorial you will learn about XML, History, Introduction, Uses, XML Technology.
HISTORY
In 1970 IBM Introduced SGML (Standard Generalized Markup Language). SGML was developed out of the General Markup Language (GML), which was developed by IBM in the late 1960s. SGML is a semantic and structural language for text documents but is very complicated. HTML is a subset of SGML. In 1996 XML Working Group was formed under W3C .The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards . W3C was created by Tim Berners-Lee in 1994 who also invented the World Wide Web in 1989. In 1998 W3C introduced XML 1.0.
INTRODUCTION
XML (Extensible Markup Language) is a dialect of SGML. XML is not a programming language. Rather it is a set of rules that allow you to represent data in a structured manner. Since the rules are standard, the XML documents can be automatically generated and processed. Its use can be gauged from its name itself Markup Is a collection of Tags XML Tags Identify the content of data Extensible User-defined tags
XML is a markup language much like HTML, but used for purposes different than what HTML is used for. Its not a replacement for HTML. An XML document contains data that is tagged. XML documents are text documents. XML, like HTML uses tags and attributes (markup), but the tags in XML are used to describe data (e.g., XML TUTORIAL 1) and not for mentioning presentation formats as in HTML. The interpretation and usage of the data is left to the application/program that uses the XML document. XML was designed to describe data and is a cross-platform, software and hardware independent tool for transmitting or exchanging information. It is an open-standards-based technology which is both human and machine readable. XML are best suited for use in documents that have large amount of similarity.
XML has evolved from SGML (Standardized General Markup Language). The first version of XML (version 1.0) was announced by W3C in 1998. Version 1.1 came out in early 2004. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. XML specification includes the syntax and grammar of XML documents as well as DTD.
USES
XML is widely used for the following purposes
When XML data is transferred across different systems, the data contained in an XML document can be read using a software entity called parser. Most of the popular databases (Oracle, MS SQL Server, Sybase, DB2, etc.) provide their own mechanisms to store and retrieve data as XML. Some of them also provide parsers to work with the XML documents programmatically. XML is a key technology when it comes to Web Services. .NET uses XML extensively. It is used as a data format for everything - configuration files, metadata, RPC, object serialization.
XML Technology
{mos_ri} Elements
Elements It Ex: < are the basic building blocks of XML contain elements data references references Comments content > parts > Hill
may Other Character Character Entity These are collectively > Mason known Hill as < element
student
/student
three student
Anatomy of tags
All elements must have a beginning and ending tag. The opening tag of an element is written between (< ) less than and ( >)greater than sign example, < student >. The ending tag is written between (< ) less than followed by a (/) forward slash and the ( >)greater than sign example, < /student >. Data between the opening and closing tags of an element are its contents. For example,
< student >Nick Price< /student > Here Nick Price is the content of the element. Most of the browsers ignore whitespaces between the tags < student > Nick Price < /student > Is < Nick < /student > same student as > Price
Note: Unlike HTML single tags(like < /br > in HTML ) in XML are not possible.
Invalid tags
< .stock >< < product1 >< < product^stock >< /product^stock > / /.stock product1 > >
Valid tags
< _stock >< < product1 >< < product-stock >< /product-stock > /_stock /product1 > >
In XML one cannot overlap tags. The opening and ending tags of child elements must be inside the parent element. Overlap of tags with siblings is not allowed as shown in the following example. < < < Jason < < < /student > The proper format is as follows < < Jason < < < < student name /name roll-number /roll-number /student > > > > > > student name roll-number /name /roll-number > > > > >
The root element is also called the Document element. There is only one root element . All other elements lie within the root. NOTE: A Tag could be empty i.e. contain no data like the roll-number tag in above example. Such tags are called EMPTY ELEMENTS.
Attributes
Attributes give the information about the elements. They can be specified only in the element start tag and their values are enclosed strictly in double quotation-mark. This is unlike HTML where attributes could be in single, double or without quotations. Syntax: < tag attribute = value >description < /tag > Example: < problem size=huge cause=unknown solution=run away >
If elements are the nouns of XML, then attributes are its adjectives.An Element can have zero, one or more attributes. Also an attribute name can only appear once within an element Bad: < Test name=John Good: < Test first=John last=Doe / > name=Doe / >
Entities
Entity references are placeholders for other values that are otherwise reserved in the language or that maybe misinterpreted. For example the less than (< ) and the greater than ( > ) symbols are reserved for demarking the tags. If the entity description itself contains one of these symbols the data would be misinterpreted. To avoid such a scenario Entities are used. The ampersand (&) symbol is reserved to indicate start of an entity. The various predefined entities are as follows
< > & " &apos LESS THAN GREATER THAN AMPERSAND QUOTATIONS APOSTROPHEE
Character data sections contain raw data that are not parsed by XML parsers. Syntax Example: < book ISIN = INB101235647 > < author > Kacey Price < ![CDATA[ kacey has also authored Complete Reference series]] > < /author > < /book > :< ![CDATA[ raw data ]] >
Comments
Comments are enclosed in < !Comments -- > Example : < !This is start of second child element -- >
Processing instructions Processing instructions are used to pass information to applications which use this information to execute special task.
Syntax : < ? ? > Example < ?xml version=1.0 encoding= ISO-8859-1? >
NOTE: Here the version attribute specifies the version of XML being used while encoding gives the encoding format for parsers. A xml document displayed in IE 5.0 or above.
NEED
XML documents can contain many different types of markups including elements, attributes and entity references. Whatever maybe the application it is desirable that the XML document conforms to a certain set of rules governing the data structure it contains. DTD and Schemas are used for this purpose. For Example, < name >12233< /name > If a DTD defines that data in name tags should contain only characters and if it contains numbers , as shown above, the document is invalidated by the XML parser using the Document Type Definition (DTD) as reference.
DTD
DTD stands for Document Type Definitions. It describes syntax that explains which elements may appear in the XML document and what are the element contents and attributes.
A valid XML document must include the reference to DTD which validates it. When a DTD is absent the validating parser cant verify the data format but can attempt to interpret the data.
TYPES OF DTD
Internal DTD: DTD can be External DTD: DTD can be in a separate file embedded into XML document
INTERNAL DTD
Internal DTD are embedded in the XML document itself. They are convenient when constraints are applied to a single document. They are also used while designing a complex DTD for testing a sample document. Also, modifications becomes relatively simpler since the DTD and markup are in the same document. Syntax : < ! DOCTYPE root_name[assignments] >
It begins with the DOCTYPE keyword (after < less than and ! exclamation mark) followed by the name of the root element. The root is followed by a square bracket which signifies beginning of declaration assignments. The last entry is a less than symbol ( >). In the assignments section elements are declared as follows < !ELEMENT child_name(child_name or data type) > Detailed E.g. < < < < < < < < ] < < < version='1.0' encoding='utf-8'? DTD for a AddressBook.xml -!DOCTYPE AddressBook !ELEMENT AddressBook (Address+) !ELEMENT Address (Name, Street, City) !ELEMENT Name (#PCDATA) !ELEMENT Street (#PCDATA) !ELEMENT City (#PCDATA) AddressBook Address >Jeniffer< ?xml !-> > [ > > > > > > > > > explanation on this is covered in the next tutorial.
Name
/Name
Street City
>Wall >New
<
/Street /City
Here the order of the declarations is not important . Thus, < < < < < < < < ] version='1.0' encoding='utf-8'? DTD for a AddressBook.xml -!DOCTYPE AddressBook !ELEMENT AddressBook (Address+) !ELEMENT Address (Name, Street, City) !ELEMENT Name (#PCDATA) !ELEMENT Street (#PCDATA) !ELEMENT City (#PCDATA) ?xml !-> > [ > > > > > >
is < < < < < < < < ] > ?xml !--
same version='1.0' encoding='utf-8'? DTD for a AddressBook.xml -!DOCTYPE AddressBook !ELEMENT AddressBook (Address+) !ELEMENT City (#PCDATA) !ELEMENT Name (#PCDATA) !ELEMENT Address (Name, Street, City) !ELEMENT Street (#PCDATA)
EXTERNAL DTD
DTD is present in separate file and a reference is placed to its location in the document. External DTDs are easy to apply to multiple documents. In case, a modification is to be made in future , it could be done in just one file and the onerous task of doing it for all the documents is omitted. External DTDs are of two Types 1) Public
These are standardized DTDs and give publicly available set of rules for writing XML Documen 2) These are created by private organizations. NonPublic
Specification
V2.0//EN
Here keyword PUBLIC specifies its a publicly available DTD. . -/+(Minus/Plus) sign implies that DTD isnt / is a recognized standard. . // used for separating category of information. .
W3C Owner Of DTD . DTD Specification V2.0 - States the label for DTD . EN Abbreviation for ISO 639-1 encoding . /XML/2005/01/xml v20.dtd - URL where DTD is stored.
!DOCTYPE
root_name
path
In the DOCTYPE declaration after the root_name specify SYSTEM to indicate that its a non-public DTD followed by file path where the DTD is stored which could be a URL or a file path.
Example The DTD for AddressBook.xml is contained in a file AddressBook.dtd AddressBook.xml contains only XML Data with a reference to the DTD file < ?xml version="1.0" encoding="UTF-8"? > < !DOCTYPE AddressBook SYSTEM "file:/// c:/XML/AddressBook.dtd dtd" > " > < AddressBook > < Address > < Name >Jeniffer< /Name > < Street >Wall Street< /Street > < City >New York< /City > < /Address > < /AddressBook >
In this tutorial you will learn about Elements in DTD, Elements, child elements (nested elements), declaring elements with character data only, declaring elements with mixed content, declaring elements with any content, declaring elements with no content and element order indicators and qualifiers. Elements in DTD.
ELEMENTS
Every element used in the valid XML document must be declared in the Documents DTD. SYNTAX : < !ELEMENT element_name content_specification > element_name: Specifies name of the XML tag Content_specification: Specifies the contents of the element which could of the following five types I) II) III) IV) V) No Content Standard Only Mixed AnyType of Character Content Data Content Content
data. contain
as
SYNTAX < !ELEMENT element_name (#PCDATA) > Example: < !ELEMENT Street (#PCDATA) > Element Street contains the parsed character data #CDATA is another keyword to declare character data. But unlike #PCDATA, whitespaces are retained as it is in #CDATA.
SYNTAX <
!ELEMENT
element_name
(#CDATA)
>
Example: < !ELEMENT City (#CDATA) > Here, DTD declares the City element to contain character data. In XML, document < City > London < /City > The XML, parse will take take the data as London and not as London as in the case of #PCDATA
childN) Example: < This < This < < /bank > bank account account account account >423578< >123456< is /account is /account
>
TYPE
| () ,
CONTEXT Either one child element or another can occur Groups related elements together Element must follow another element
DESCRIPTION
? * +
Elements appear once or not at all Elements appear zero or more times
EXAMPLES: The pipe symbol (|) specifies choice. So occurrence of either of the chiold element is considered valid by the parser. Following declaration specifies that name must contain either first_name or last_name < !ELEMENT name (fist_name | last_name) >
first_name
/first_name
last_name
/last_name
XML Advantages
Author : Exforsys Inc. Published on: 5th Jul 2007
XML Advantages
There are many advantages to using XML for information exchange, and they offer many benefits to the user. The Extensive Markup Language uses human language, which is conversable and not the language used by computers which is binary and ASCII coded. XML is readable by even people who have had no formal introduction to XML or have been coached on it.
It is as easy as HTML. XML is fully compatible with applications like JAVA, and it can be combined with any application which is capable of processing XML irrespective of the platform it is being used on.
XML is an extremely portable language to the extent that it can be used on large networks with multiple platforms like the internet, and it can be used on handhelds or palmtops or PDAs. XML is an extendable language, meaning that you can create your own tags, or use the tags which have already been created.
From a programmers point of view, there are a lot of parsers available like the API, C and many more. If your data is very rich, then using XML to capture the data makes a lot of sense mainly because it is in plain text and in a language that humans can read. XML also gives the freedom to define your own tags that fit your application needs. XML can also be stored in databases in XML format and human readable format. The advantages of XML include that it can be used as an instrument to share data and application models in wide networks like internet.
You can also have the freedom to develop at your own pace and moreover develop tools that will be helpful for your programming needs without a lot of investment of time or money. Here by defining your own tags you are widening your horizons. You can make the tags work for you and develop anything the way you want it, compared to vendor declared tags where you will have to fit your programming needs to suit the tags, which is a big limitation to creativity in programming.
the file. You can still retain the list format if you need to and also have the table format simultaneously. Searching the data is all the more easy in XML document because any search engine can easily parse through the data using the tags and locate the required data. It offers a freeway to navigate through data. The XML data is structured and tree shaped depending on the way it has been formatted. Even complex relationships in the tree structure, and the parent child relationships in a directory because it is clear in its format. The codes in XML are easily legible to a first timer, and also because it is all written in simple plain text and in a human readable language.
Many companies are now depending on the web services which can provide them solutions for a centralized environment for their data which needs to be safe and secure. Many applications come with a legacy and heavy price tag to pay. XML is the simplest solution a user might have imagined in solving all these complex issues.
XML Disadvantages
Author : Exforsys Inc. Published on: 8th Jul 2007
XML Disadvantages
The extensive markup language is the way to go for developing future web applications, and it almost defines the future of web development. There are no doubts about its performance in this arena. However, XML also has some draw backs which need to be looked at and improved upon. The reason it faces some resistance from users is a result of these drawbacks. One of the biggest drawbacks of XML is that it is lacking in the area of adequate applications for processing.
Since XML is a verbose language, it is totally dependant on who is writing it. A verbose language may pose problems for other users. XML is not specific to any platform, and has a neutral platform requirement which may be a disadvantage in a few circumstances. All the standards of XML are not yet fully compliant. They are not fully recognized to be used yet. Users have reported problems with the parser and there are problems with XML and HTTP which are still being resolved.
External entities again pose a perennial problem, which again is a major disadvantage for XML. The best way to fix the external entities problems with XML DTD is to not to use them at all, or if you have to use them, then don't use them on the producer side, and moreover do not attempt to retrieve them on the client's side. In case you are writing the specifications for an XML document, do not even mention the specifications for DTD in the vocabulary, and there is also a need for the programs to run their parsers for XML by disabling the external entity resolution. Otherwise the external entities problem will invariably crop up, triggering a series of problems which cannot be solved by the XML environment alone. While layering the specifications it can be considered against the rules to disable or ban certain document types, which is allowed in SOAP. If your job is to implement the Web application which is based on XML, you may need to configure the parser not to perform the DTD based validations, and also not to try and resolve the external entities. This could be an answer to some of the future problems, so taking precautionary measures is worthwhile. Publishing documents on the web requires the same precautions to be taken by not including the document types. The document may not be valid like the way XML describes it to be, and some people even believe that the document validation in XML is overrated. Document data types are not known to be very powerful when it comes to validation and it has been forgotten that the document has its own language and grammar which can again not be efficient while getting validated. There is also the problem of other programs not trusting the XML DTD. The doctype in HTML is much different from the doctype in XML. So you may not be able to use the doctype in XML as an indicator, which helps programs understand what type of document it is dealing with. If there is an application which exists that can handle multiple vocabularies of XML, and also knows to dispatch the respective documents to the concerned handlers by checking the namespace at the root of the element, then you can consider yourself lucky. If the vocabularies are not mentioned in the namespace then you can look for them in the mime type. In some cases the Vocabularies are not present in the name space, nor are they specific to the mime, and then such language is certainly a bad example and will create a lot of problems because you will have to use the root element name. The XML specifications define three kinds of files processing. The first one is DTD based validations which do not perform or retrieve external entities, and the second one is the DTD based validation which do not perform or retrieve external entities so that the infoset and the reference library can be expanded. The third one is to perform the DTD based
validation by retrieving the external entities so that the infoset and the entity reference can be expanded. The point of having many profiles is so that the application has a choice and it chooses the right one. Character entities are considered unsafe for web applications. It is a disadvantage because there will be a problem with the input and its editor. On the World Wide Web there may be other options available when there is such a problem. The situation need not be so unfortunate because there may be a solution which exists, and there in fact is an input method which can solve the problem with the editor. If the XHTML entities were pre defined then there wouldn't have been much of these problems. But that is going back in time, and it cannot be changed. As discussed earlier, sometimes for XML its flexibility could turn out to be its biggest disadvantage.
UDDI: UDDI or the Universal Description, Discovery and Integration Service are a dynamic protocol used along with the Extensible markup language to find other web services on the internet. UDDI's functionalities are very similar to the CORBA and also act as a Domain Name Server for service for various business applications. The UDDI is dependant on the SOAP in a way that the UDDI sends requests disguised as SOAP messages. It is still not accepted as a standard protocol for XML because of its limitations in terms of dependency on SOAP which itself is under scrutiny and undergoing changes. XLANG: XLANG is an extension of WSDL or the Web Services Description Language. WSDL is a XML based service which helps the communication between web services. XLANG service is also used to undo some complex operations. In fact the main usage of this protocol is to undo the operations which are very important form the commercial aspect of these applications. XAML: Transaction Authority Markup Language acts as a compensatory language for the XLANG. XAML is also a service that is used to undo operations but in this case the XAML does not restrict only to two phased applications like between a buying and paying transaction, or a selling and receiving transaction. It leaves other options open for a two way transaction to be undone. XKMS: XML Key Management Specification is mainly use to create digital certificates or signatures with XML applications. XKMS is further divided into two services the XKISS and the X-KRSS. The XKMS protocol depends on the XML, WSDL and the SOAP largely.
XML standards
Since the beginning XML or the Extensible Markup Language has been growing constantly different standards are being asset and different technologies have been evolving. For XML users it may be extremely difficult to keep up with the ever changing spaces and new entries. The word standard has to be redefined when it comes to XML because there are so many standards of usage already and more are being added. However there are some core standards in XML which can be considered as a dictionary of fixed terms. These terms form the basis of what is expressed in the Extensible Markup Language.
Canonical XML or the C14n
Canonical XML allows the creation of XML documents in XML syntax without changing the meaning of syntax or causing any syntax errors and creates a representation of the XML document physically. This is a standard method of creating a physical representation of an XML document.
XML Catalogs
XML processors find information on how to resolve a URL in a XML catalog. It also has the capability of substituting one resource with the other. Catalog processing is an integral part of the XML parsing.
XML information set or the Infoset
XML information set enables to list an XML document in as series of objects or in a series of descriptions that have specialized properties. This series also provides information on the XML document.
XML Name spaces
XML namespaces enables the users to provide universal names attributes for elements in an XML document. For example namespaces like head and body can be used which are otherwise used to describe anatomy of a human body.
RELAX NG
RELAX NG is a kind of language editor which can be used to describe, define and also provide limitations for XML language. It is a grammar based schema language. Schema means something which can be used to limit and define terms in language.
Schematron
Schematron is also a schema language but it is a rules based language. It just creates rule and not limitations. These rules define and limit the XML language.
DSDL or Document Schema Definition Languages
Document Schema Definition Languages or the DSDL provides a framework for the validation and core processing of Extensible Markup Language. It contains individual specifications ether in small groups or in experts and they are all well defined. The DSDL framework of specifications can be used separately or collectively for XML validation.
Uniform Resource Identifiers (URL) and International Resource Identifiers (IRI)
The Uniform Resource Identifiers is a tool used to identify resources that are of HTTP, XML, and Multimedia in nature. International Resource Identifiers are tools used to locate URL's, XML documents, http documents from the international resources on the internet.
W3C XML Schema
W3C XML schema is one of the schema languages to define and limit the XML language. It also forms the foundation for a few standards in XML message or data binding.
XML Inclusions or Include
XML Inclusions or XInclude has the capability of including or merging all XML documents and also ahs added features. One large document can be merged with smaller ones.
XML Linking Language or XLink
XML Linking language or XLink is a framework which enables a facility to create links in a XML document. It is used to create simple links which are essential for XML documents.
XML Base
XML base or the Extensive Markup Language Base is the tools which enables the merging of ML elements with the URI's or the Universal resource Identifiers and the IRI"S or the International Resource Identifiers. It provides a platform where these both the XML documents and the URI's and IRI's can associate with each other.
XML ID
XML ID provides an environment for expressing the unique identifiers and attributes which are used to identify the elements of the XML document.
XML or Extensible Markup Language
The XML or the Extensible Markup Language is a derivation of the SGML or the Standard Generalized Markup Language. While the SGML was a very rigid format the XML is a much more relaxed environment to work with.
XML Path Language or the XPath
XPath is considered to be the most successful of all the XML technologies today. It forms syntax or a data model to identify different parts of XML document.
X Pointer Framework
The XPointer Framework refers to the fragments and their locations in a XML document. It brings similar URL's which use hashes to point a particular link of a HTML document together. Apart from the standards for XML documents there are some XML processing standards like the Cascading Style sheets, Document Object model, Remote Events for XML, Simple API for XML, State Chart XML, SOAP, SQL with XML extensions, XML Binding Language, XForms, XML Processing Model and the Extensible stylesheet Language Transformations (XSLT). Some of the key XML vocabularies are
Atom Syndication Format, Darwin Information Typing Architecture (DITA), DocBook, Mathematical Markup
Language (MathML), Open Document Format for Office Applications (Open Document), Resource Description Framework (RDF), Synchronized Multimedia Integration Language (SMIL), Scalable Vector Graphics (SVG), Voice Extensible Markup Language (VoiceXML), XML Bookmark Exchange Language (XBEL), XHTML, XQuery 1.0: An XML Query Language, Extensible Stylesheet Language Formatting Objects (XSL-FO), XUpdate
Several organizations have been involved in creating standards for an XML document for the XML users like the World Wide Web consortium which is also commonly referred to as W3C an they usually issue recommendations rather than standards. Another of these is the International Organization for Standardization which probably leads the others and the most of active of all. Organization for the Advancement of Structured Information Standards (OASIS) has its own standards and has been approved and recommended by the Oasis team. The last is the Internet Engineering task Force is an organization which thrives on public opinion gathered from collecting reviews over the Internet. They collect Internet drafts and RFC's or Request for Comment, almost anyone with a computer and Internet can submit the RCF or the Internet Draft and voice their opinions. The XML community has gained tremendous mileage in the past for its activities in spite of its varying standards and shortcomings it has remained a huge success.
XML Parsing
Author : Exforsys Inc. Published on: 14th Jul 2007 | Last Updated on: 15th Jul 2007
XML Parsing
XML documents can be parsed efficiently and more critically because XML is a widely accepted language. It is extremely crucial to programming for the web that XML data be parsed efficiently, especially in cases a where the applications that are required to handle huge volumes of data. When parsing is improper it can increase memory usage and time for processing which directly affects the scalability by decreasing it. There are many XML parsers that are available. Choosing a right one for your situation might be challenging. There are three XML parsing techniques which are extremely popular and are used for Java and it also guides you to choose the correct make right choice of method based on the application and its requirements. An Extensive Markup Language parser takes a serialized string which is raw as input and performs a series of operations with it. First and foremost the XML data is checked for syntax errors and how well it formed is, and it also makes sure that the start tags will have end tags that match and that there are no elements which are overlapping with each other.
Many parsers implement first validate the Document Type Definition (DTD) or even the XML Schema sometimes to verify if the structure along with the content are correctly specified by you. In the end the output after parsing is provided access to the XML document's content through the APIs programming modules. The three XML parsing that are popularly used with techniques for Java is, Document Object Model (DOM), it is w3c provided mature standard, and Simple API for XML (SAX), it was one of the first to be widely adapted form of API for XML in Java and has become the standard, the third one is Streaming API for XML (StAX), which is a new model for parsing in XML but is very efficient and has a promising future. Each one of the mentioned techniques has their advantages and disadvantages.
an expensive process. The Data object model tree can actually consume a lot of memory. Though the DOM is very interoperable and interoperability is the biggest positive point it can offer at the same time it is not very good with binding and this proves to be its draw back when it comes to object binding. There are a lot of applications which are well suited for DOM parsing. If the application needs to have immediate access to the XML document randomly then in such cases the DOM parsing is appropriate. For example an Extensive Style Language processor always has the need to navigate through an entire file and this becomes a repeated process while it is processing templates. Dom is dynamic when it comes to updating or modifying data so this feature is extremely convenient for applications, like the XML editors, which need to frequently modify data.
when it comes to the DOM's element supports, and you also have to keep track of the parsers position in the document hierarchy. The application logic gets tougher as the document gets complicated and bigger. It may not be required that the entire document be loaded but a SAX parser still requires to parse the whole document, similar to the DOM. One of the biggest problems the SAX is facing today is that it lacks a built-in document support for navigation like the one which is provided by XPath. Along with the existing problem the one-pass parsing syndrome also limits the random access support. These kinds of limitations also start affecting the namespaces. These shortcomings make SAX a not so good choice when it comes to manipulating and even modifying a XML document. Applications that can read the documents content in one single pass can derive huge benefits from SAX parsing. Many Business to Business Portals and applications use XML so that the data can be encapsulated in a format in which it can be received and retrieved using a simple process. This is the only scenario where the SAX might win hands down compared to DOM, purely due to the efficiency of SAX which results in high output. The modern SAX 2.0 also has a built-in filtering mechanism which makes very easy for the documents output to be subset. SAX parsing is also considered very useful when it comes to validating DTDs and the XML schemas.
XML Processing
Author : Exforsys Inc. Published on: 16th Jul 2007
XML Processing
XML documents process is explained by a huge set of specifications and the list of these specifications is growing endlessly. A lot of applications depend on these specifications to work with XML or extensive markup language. These specifications will have all the requirements listed for XML processing model and even the XML language specifications. These specifications are more at the conceptual level and contain descriptions about the language based interactions.
The XML documents are treated as a set of information modules and the specifications contains processes which construct new sets of information modules, inspect the information sets, modify them or extract information from the per existing information sets. The processing model has to be described in terms of the info set and the applications which have been working with the solid object models cannot be considered as the info set. The applications use DOM object models or the SAXX event stream or other representations of the info sets.
Uses of TRaX
The XML transformation is included in the TRAX API and the original work of the JAXP is extended to bring in a vendor and a standard Java API for identifying and carry out the XML transformations. TraX plays a more important role in this environment that just being
an API engine and its main usage is for being a general-purpose interface for transformation of XML documents. TRaX is not in competition with the data object model or the java data object model or even the SAX, it is just an API which is used to represent the XML transformation methods and bridge these various methods. It includes SAX events and templates from XSLT. TRax also relies upon SAX2 and the Data object model or the DOM and their parsers to a great extent. TRaX basically provides the same level of functionalities like the XSLT engines but the parsers can be changed by changing their properties. In certain codes for a successive transformation the XSLT code has to be reprocessed. A common scenario is that the same set of transformations is used to apply to different sources repeatedly but possible in different series of threads. A better way to approach this whole thing would be to process the style sheet transformation only once and keep this as a copy by saving it for the other repetitive transformation cycles. This way a lot of time can be saved and the process need not be repeated over and over again. By using the TraX interface and its templates this can be done. When the transformation is taking place with the help of the transformer the actual instance for the template would be the real run time processing that takes place during the transformation and the instructions that go into it. If you would like to increase output and performance levels then these templates instances can be saved and used and also these templates are thread safe. The very fact that a XSLT style sheet contains a huge collection of templates of one or more elements leads to interfaces which end up with plural names. Each style sheet transformation is defined by a template element within the same style sheet and therefore it chooses the simplest name available for the template for representing the collection of templates
The data object model basically represents the entire data in an XML document in a tree shaped structure like format. This tree shaped structure format can be easily manipulated by Java because as it is DOM has it that it is very simple for other programs to use as an advantage. You can use this advantage to modify data and even extract data when needed fro this tree shaped structure. But what Dom basically does is it parses the whole document and not some parts of it like the SAX. So if you have no need for the entire document then parsing the whole document will be a waste of time and a wasted effort and a waste of memory space for you. When you have large XML documents and have to parse only a small portion of it then it makes sense to use the SAX. While parsing the XML data using DOM there are two major tasks to be fulfilled, one is converting the XML data into DOM data and the other is looking at the data that would be useful for you. XML processing with Java takes place when a parser is specified and if a parser is not specified then the Apache Xerces parser is used.
Parsing in SAX
SAX parsing also includes two major tasks while parsing just like the DOM. One is to create a content handler and the other is to invoke the process and direct it to the content handler. However some instructions have to follow while parsing like telling the system about which parser to use. You have to create an instance for the parser and also then create a content handler which will respond to the parser. The start of document and the end of document should be declared along with start element and end element. The Characters and the white spaces which can be ignored should be clear. Finally the content handler has to be designated to invoke the parser. If the last step is not done then the entire processing function of the parser in the SAX will not happen.
The start element is something which is found in the start tag of the document. In case you forget to mention the element in the tag then the start element will not be present and there for the document itself will not be identified. In case there are errors while parsing this is the first place to check for errors. The end element is typically found in the end tag of the document and it takes values by subtracting two from the indentation and then presents a message. A character is something which is used to print the first word of the tag body and it does' not change the indentation.
XML RPC or the XML Remote Calling Procedure is a set of compilations and implementations which allow certain programs to run on complex platforms or operating systems and allows them to make Remote Procedure calls on the Internet. The Remote Procedure Calling Protocol uses HTTP as a transport and uses XML for encoding. XML RPC allows complex data structures to be processed and transmitted or returned and it is very simple to be operated on the Internet. The set of XML RPC implementations is spread over various operating systems like C, C++, Java, LISP, PHP just to name a few.
Function Libraries
There are several function libraries that the XML RPC provides. But mainly the libraries are divided into C and C++. You can either use these libraries individually or you can use them together.
C libraries
The libXML RPC The functions of the libxmlrpc can be discussed here in brief. In the libxmlRPC function the header file declares the interface and how to link to it and a lot more information. Generally the library function will either work or fail. But the distinction between these very crucial functionalities is itself little hazy as to decide what amounts to success and what amounts to failure. Because when the library actually stops functioning or doesn't function at all it doesn't change anything or rather nothing happens that can be distinguished from the prior scenario. However the library function does send an analytical report of how it failed. The LibXML RPC Client The Lib XML RPC Client uses a range of global constants as codes. Due to this the program which is running has to call a library function in order to set up the file code. However these global constants are not safe when a thread is running so you have to ensure that you should not call for this library when a thread is running in a program. However there are functions within the Client to interrupt or debug the program in case it has been used with the thread. The main usage of the LibXML RPC is when you have to run an interim part of the program and not the main program itself. The Lib XML RPC server An XML RPC server basically contains a machine or system which will receive the remote procedure calls and send responses to them. It uses two methods to do this and these methods are stored in the methods registry. The drivers for the protocol use this method registry to execute a XML RPC call and send a response also. These methods are called type 1 and type 2. Type 1 is usually the default. But type 2 is used more with the newer scripts because it has more advanced functionalities. Lib XML RPC server Abyss Abyss is nothing but a general HTTP server program that is used as a web server program. It is very similar to the apache program. The XML RPC is implemented over HTTP, with the abyss server with a handler attached the LIB XML Server abyss can execute a XML RPC call and make a connection. You can write your own Abyss request handler which will take the XML document and convert it to an XML RPC call and give the response as an XML.
Lib XML RPC server CGI CGI or a Common Gateway Interface is used in web servers as a standard interface. This protocol is used with the web interface to perform an http request in calling a user program. For example if the HTTP makes a GET request then the server executes the GET request by sending the file with the contents named GET in it to the web server and the web server can configure a CGI program to send responses. We know already that the XML RPC server can be implemented over HTTP, and all that is needed for an XML RPC server is a webs server that is configured to run a CGI program that knows how to execute a XML RPC call.
Lib XML RPC Server Pstream ++. This is the only library function which is different from the c libraries. The functionality of this library is to send the information in packets and stream them in an order. This program handles every server connection individually, that is after completing streaming one server connection it exits and then restarts. It depends on the Transmission Control Protocol connection for the client as a connection standard. In order to handle to a series of server connections you should configure to accept TCP connections. A Packet stream is far easier than HTTP to handle. HTTP itself is simple and the packet stream is simpler and very easy. It is probably the simplest way to communicate XML RPC messages. A packet stream is a two way communication method which consists of information packets which are traveling in both directions. A packet stream is nothing but a stream of bytes and this stream of bytes can be different in size that is each individual packet may be of a different size. Each of these packets has a unique connection with the Socket stream or the TCP connection. Each XML RPC message amounts to one packet in XML RPC Packet Stream. And all these individual packet streams are connected to each other.
XML Security
Page 1 of 2
Author : Exforsys Inc. Published on: 21st Jul 2007
XML Security
Documents can be secured using XML now. When data is released to the web it becomes free for all and is available everywhere and it is literally omnipresent. How do you secure and safeguard something which is so widely spread. Security issues for XML documents has now reached climax because XML documents can be secured using XML security. XML secures the documents in two ways; one is the ML signature and the other XML encryption.
XML Encryption
In the World Wide Web security is taken care of by secure socket layer (SSL) and Transport Level Security (TLS). This security software's makes sure that end to end applications are safe and secure, for example email communications. But these can cater to only the end to end segment. XML Encryption takes care of the gaps in the areas where the secure socket layer or Transport level security cannot fulfill. IXML security is capable of providing end to end security and selective security.
An overview of Signatures
XML signatures may be applied to digital content or data objects arbitrarily. Digital data objects are disintegrated and then placed with a cryptographic signature in the document. The Signature Element represents the digital data by using a structural format for representing the said data. The validation process involves two steps. One is validation of the signature and the other is the validation of every single reference in the document. The algorithms that calculate the value of each signature is included in the signature itself. The key info usually has the info required to validate the document. The processing contains of three steps, core generation, core validation and core signature syntax. Core generation is further divided into two levels, reference generation and signature generation. In reference generation for every data object that has been signed, transforms are applied according the data object determined by the application. The value of the signature is calculated for the data object and then the signature element is constructed which will include the objects and the signed information.
In Signature generation the process that is followed is using the signature method, canonicaliztion method and references, a signed info element is created. Using the algorithms in the signature info the value of the signed object is calculated and then the signature element is constructed which will include the objects and the signature, key info and the signature value. Core validation is further divided into two steps. These are the signature validation and reference validation. Some times in an application there may be some valid signatures but the application fails to validate these signatures. It may be caused due to the failure in implementation of a few parts in the specification or unwillingness to identify specific algorithms or even universal resource identifiers. In the reference validation process the signed information element is canonicalized using the canonicalization method in the signed info. Then the data object is obtained and digested. The resulting data is digested or disintegrated using the digest method obtained from the reference specification and then the digest value is generated and compared to the digest value in the signed information reference. If there is any mismatch or inequality in the values the validation will fail and will be unsuccessful. In the signature validation process the keying information is obtained either from an external source or in the key info and the canonical form of the signature info is obtained using the canonicalization method and the obtained result is used to validate the signature value and the signature info element. Core signature syntax provides information on the features the core signature. These features are important and a must for the function of the program or its implementation.
Planning) and CRM (Customer Relationship Management) functionalities by using XML with the RDBMS. Some of the features of XML in SQL server are OpenXML, HTTP, OLE DB, ADO access, XML modes, XML views, and SELECT statement options. Data can be accessed in three ways using HTTP and they are SQL Statements entered in the URL (Universal Resource Locator), Templates, and through HTML post event integration. Data is retrieved using the XML modes. There are three types of XML modes and they are RAW, Auto and Explicit. The RAW uses the methodology of taking each row and converting the result into an XML document. The AUTO uses the method of returning the queries in a XML tree format. The Explicit mode simple defines the shape of the XML tree and specifies the way the queries can be written. XML Data schema provides ML view on the database using the annotations derived from the SQL server. These annotations also appear within the XML schema to identify a two way mapping system for the XML data from tables to columns and then columns to tables back. The name of the annotations remain the same fro the XML data schema, to the database and the column name. These annotations are also used to define a hierarchical relationship between data.
XPath queries
XPath works with the XML view technology so that the data that is being retrieved can be in the form of an XML document.
OPENXML
OPENXML enables a way to access XML Data and present it in the relational database. It creates an environment for the database to interact with the XML data within the SQL by transferring the data to tables. Statements such as SELECT, UPDATE, DELETE and INSERT can be used along with OPENXML.
backward-compatibility for XML in SQL server. Since the integration of XML data type the generation of data is directly in XML format. Microsoft SQL server and the SQL XML are known to create excellent and efficient XML data management techniques. The XML centric approach has the capability of processing loads of XML data using the annotations that were defined using the XML view and XSD. The data can be divided into two divisions, data modeling and data usage.
Data Modeling:
Whether a user has to choose XML View technology or XML storage depends on lot of varied factors. For a highly structured for of data that has familiar schema to encode the relational data model will work the best for data storage purposes. When the data is unstructured or semi structured or flexible, it can be modeled to fit the needs. However SQL server provides a lot of tools to model data. When there is a need for a model which is platform independent and easily transferable then XML is the bets choice in such circumstances. When the data is stored in XML the engine checks for the authenticity of the data and if it is well structured to support fine grained data for queries and updating.
these is the storage option, which depends on whether the data is large or small. Then Query capabilities are the second factor which will determine the storage option. It all depends on the nature of query and the type of data which will decide. Indexing XML data also plays a pivotal role as the process speed improves when the data is indexed in XML. The data modification capabilities also are essential for certain types of data you ate dealing with. The language support for data modification should be feasible at the same time. Last and not the least is the schema support. The schema may be able to map the XML document but the Xml document being a schema document also matters. These choices function individually and it all depends on the data to decide which will suit the best. The XML view technology can be used along with the Xpath technology to view SQL queries in the table. Updates can also be changed in the tables itself. XML view technology is useful when there is XML programming using XML view. There is a schema for an XML document and when the data doesn't have to be organized. If you need to query the data using the XML Path and you need bulk data to be redistributed into tables for immediate access of the XML view?