0% found this document useful (0 votes)
237 views7 pages

A High Performance Parallel XML Parser On Multicore by Integrating Parallel Parsing and Schema Validation

This document proposes a novel algorithm for parallel XML parsing and schema validation on multi-core processors. It first divides an XML document into approximately equal chunks that are parsed speculatively in parallel. Exceptions encountered during speculative parsing are handled separately. The partial results are then integrated to form the full DOM tree. In addition, the paper presents a method to parallelize schema validation and integrate it with parallel parsing for improved performance on multi-core systems.

Uploaded by

tau_neutrino
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views7 pages

A High Performance Parallel XML Parser On Multicore by Integrating Parallel Parsing and Schema Validation

This document proposes a novel algorithm for parallel XML parsing and schema validation on multi-core processors. It first divides an XML document into approximately equal chunks that are parsed speculatively in parallel. Exceptions encountered during speculative parsing are handled separately. The partial results are then integrated to form the full DOM tree. In addition, the paper presents a method to parallelize schema validation and integrate it with parallel parsing for improved performance on multi-core systems.

Uploaded by

tau_neutrino
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

A High-Performance Parallel XML Parser on Multi-core by Integrating Parallel Parsing and Schema Validation

Yu Wu
[email protected]

Qi Zhang
[email protected]

Zhiqiang Yu
[email protected]

Jianhui Li
[email protected]

Intel China Software Center MMD/SSG Abstract


Nowadays, XML is playing crucial roles in web services, databases, and document representing and processing. But the processing of XML document has been regarded as the main performance bottleneck especially for the processing of very large XML data in some scientific datasets. On the other hand, multi-core processing is increasingly becoming mainstreaming both on the desktop computers and server computing machines. Its a potential great opportunity to parallelize XML processing to take full advantage of multi-cores. In this paper, we first present a novel, high performance parallel XML parsing algorithm, then we make an aggressive innovative attempt to develop a parallel scheme validation algorithm which can be integrated with parallel XML parsing. This parallel validated parser has shown great overall performance advantage by exposing more parallelism on multi-core platform. The experiment result is rather exciting to have shown its excellent performance. Keywords Parallel XML Parsing; Parallel Schema Validation; Task Execution Model; Multi-core parsing model, which consists of an initial pre-scanning phase to determine the structure of the XML document, followed by a full, parallel parser [6]. The results of pre-scanning phase are used to help partition the XML document for data parallel processing. Its a feasible approach to parallelize XML parsing but has limited performance gains when running with more than four parsing threads on multi-cores. Pre-scanning has to scan the whole XML document once at the beginning as sequential execution path, which decreases the performance greatly on multi-threading execution environment. Michael R.Head also explores to develop new techniques for parallelizing parsers for very large XML document [7]. They dont focus on developing parallel XML parsing algorithm, but expose the parallelism by dividing XML parsing process into several phases, such as XML data loading, XML parsing and result generation, and then scheduling working threads to execute each parsing phases as pipeline model. There are some performance issues with this approach. Its difficult to load balance in such a pipeline model. And the overhead of thread communication and synchronization may eliminate the benefit from parallel parsing. In this paper, we present a novel parallel XML parsing algorithm which shows high performance and good scalability. Besides, we continuously explore how to parallelize schema validation and integrate with parallel XML parsing to take full advantage of multi-cores. There is no published paper till now to make such attempt. The performance evaluation result has shown the great performance benefits by combining parallel XML parsing and schema validation. The rest of the paper is organized as follows. Section 2 gives a typical process model in an XML based application. Section 3 and 4 mainly focuses on the parallel XML parsing and parallel schema validation algorithm. Section 5 introduces the task execution model and discusses how to map all of tasks to each working thread to achieve best performance. The last section gives the performance evaluation results and makes a performance comparison with an existed parallel XML parser model.

1.Introduction
Extensible Markup Language (XML) has become much more than just a data format for information exchange, it has been playing crucial roles in web services, databases and document processing fields. For example, the Metadata Catalog Service (MCS) [1] runs on top of a Web services that provides functionality to store and retrieve descriptive information on millions of data item based on XML format. The eBay Web services specification has a few thousand elements and hundreds of complex type definitions. Communicate with eBay via the SOAP protocol requires processing of large XML document [2]. [3] And [4] show that most implementations of Web services do not scale well when the size of XML document that needs to be processed is increased. Moreover, most of XML based applications require schema or DTD validation during or after XML parsing. Schema validation not only checks a documents compliance with a scheme but also determines type information for every node of the document. Its a critical application for database system and Xquery language which are sensitive to data types. Its worse to add such extra significant overhead of schema validation into a poor XML parser [5]. On the other hand, multi-core/many-core is on its way to become the mainstream computing platform. So its a great opportunity to improve the overall performance of XML parsing and schema validation on multi-cores through parallelization. Parallelizing XML parser and schema validation is not only to improve the overall performance of validated parser, but also to demonstrate the power of multi-core computing. Several of papers have touched this field to develop a parallel XML parser. Wei Lu first presents a pre-scanning based parallel

2.Typical XML Processing Model


A typical XML processing model in an XML based application may include the four phases. 1. XML Character Scanning and Well-form Checking. 2. Validating. 3. Result Generation, such as SAX event or DOM tree 4. Application, i.e. the application to visit parsing result. The first phase is syntactic, it scans XML document, recognizes each meaningful token and addresses whether or not the XML document is well-formed. Usually, we build automata to do such work. The second phase is validating. It address whether or not the structure is a valid instance against a given schema. The third phase generates parsing result, such as DOM tree or throwing SAX event through callback function registered by

application user. In most of XML parser, this phase may be integrated into the first phase. The last phase is application relevant. In general, XML parser should provide common interface (like DOM or SAX) to permit application to visit the parsing data. This model gives a whole picture of how an XML document will be processed in a typical XML based application. As mentioned in paper [5], the first three phases are the main bottleneck in such application. In this paper, we address this problem by parallelizing all of the three phases. This parallelization solution improves the overall performance in such a XML processing model. It includes two parallel algorithms which are parallel XML parsing and parallel schema validation.

3.1Subtask Generation Subtask generation is used to generate parallel task at the beginning by a single thread. This step is a sequential execution path whose algorithm should be as simple as possible. Therefore, it just divides XML document into several of approximately equal-sized chunks according to the working thread to ensure load balance. The only requirement is that each chunk must start with the left angle bracket < which demonstrates starting a new element. But it is possible that the starting character in one chunk may belong to a part of CDATA or COMMENT, which is not what we expect. This special case can be covered in speculative parsing. 3.2Speculative Parsing 3.2.1Basic algorithm Speculative parsing works on a chunk of XML data and generates partial DOM trees. Because each chunk always begins with the left angle bracket <, it can be regarded as a new XML document though it may not be well-formed. Based on this knowledge, speculative parsing can parse them as the same way as a single thread parser does. The only difference is that it must hold some ill-formed exceptions, because it may not be a real error in the whole XML document. Considering a simple XML case shown in figure 2, it includes two chunks. For the second chunk, the first element is an end element, which is not consistent with XML specification, but the whole XML document is still well-formed. So speculative parsing should catch the following three exceptions and keep them into separated stack to be processed further because they are all dependent on the information of former chunks. 1) Standalone End Element, which means the end element appearing when there is no any matched start element in current chunk. Like the element book in the second chunk of figure 2. 2) Standalone Start Element, which means the start element without corresponding end element in current chunk. The standalone start elements include catalog, book, and title in first chunk of figure 2. 3) Standalone Prefix, which means the prefix without associated namespace definition in current chunk, like the prefix bw in second chunk of figure 2. Except for the three exceptions, speculative parsing still can throw any other ill-formed exceptions directly. Besides, during building partial DOM tree it must create a new empty DOM tree as the current building tree whenever meeting a standalone end element, because the following XML fragment may belongs to a different branch in the whole DOM tree. <bw:catalog xmlns; Chunk 1 bw=http://bookworld.com/catalog> <bw:book bw:isbn=0-596-00292-0> <bw:title>XML IN NUTSHELL </bw:title> Chunk 2 <bw:author>Elliott Harold</bw:author> <bw:author>Westcott Means</bw:author> </bw:book> </bw:catalog> Figure 2. A sample XML includes two chunks

3.Parallel XML Parsing


In parallel XML parsing algorithm, it scans an XML document in parallel as well as checking the well-formedness, and then generates the parsing result. In general, the parsing result may be either DOM tree representing the XML document or a sequence of SAX manifest as callback. We choose DOM tree as the output in this paper. But its not difficult to generate a sequence of SAX events in parallel, what we need to do is just designing an efficient binary representation to hold all of the SAX events generated by multi-threads, then replay it as sequence. The more detailed discussion about parallel SAX parser is out of scope. The implementation of parallel XML parsing algorithm is based on Lexer2, which is an Intel high-performance SAX-style parser. The basic idea of this parallel algorithm is dividing XML document into chunks, each thread would work on the different chunk independently and generate its own partial DOM trees. As all of chunks are parsed, the results are merged. This requires that each thread begin parsing from an arbitrary point in the XML document, however, which is the problematic. Because most chunks may begin in the middle of some string whose context and grammatical role is unknown. For example, whether the staring character in one chunk is part of element name or attribute or text value. Without this information, the parser doesnt know how to start parsing the chunk. Pre-scanning based algorithm presented in [6] addresses this issue by getting the skeleton of an XML document through pre-scanning before parallel parsing. So the context of first element in each chunk can be retrieved from the skeleton. But pre-scanning is very costly which is easy to become a bottleneck of parallel parsing algorithm. We adopt an elegant way to parallelize XML parsing while avoiding pre-scanning, that is, dividing XML document into approximately equal-sized chunks that always start from a start element tag and parsing them as normal in parallel. The result merging and all of dependence issues like namespace determination are handled in post processing stage after finishing chunk parsing. This algorithm avoids pre-scanning the XML document in advance, which can improve the performance significantly. Figure 1 gives the architecture overview of parallel XML parsing algorithm. This algorithm includes three stages, subtask generation, speculative parsing and partial DOM tree validation and merging. We will introduce them in detail.
Parallel Parsing
Chunk Input XML Subtask Generation Speculative Parsing Speculative Parsing Speculative Parsing ` DOM Tree Validation & Merge Result DOM `

Chunk

Chunk

Figure 1. The architecture of parallel parsing

3.2.2Special Case Speculative parsing can work well in most common case. But there are two special issues. One is how to work with this situation if a chunk starts inside a section of CDATA or COMMENT, the other is the default namespace processing. Regarding to the first issue, the chunk starting inside CDATA or COMMENT section cant be parsed correctly. We address this issue by checking the last element type for each chunk. If it is CDATA or COMMENT element whose end tag doesnt appear in the same chunk, then the next chunk must be invalid, the parsing thread on current chunk should continue to parse next chunk. The other special case is about default namespace processing. According to XML specification, if an element defines a valid default namespace as current default namespace, all of its children elements with empty prefix should associate with the current default namespace; otherwise, they should associate with a pre-defined default namespace defined by XML specification. In speculative parsing, the problem is that we cannot determine default namespace for empty-prefixed element because we dont know whether there is default namespace definition in former chunk. The key to address this issue is determining namespace scope for each element according to its relevant depth in current chunk. The detailed algorithm is described as follows. First, speculative parsing maintains a depth counter which is equal to the number of standalone end element in a chunk. Whenever the counter is increased by one, which means starting a new default namespace scope, it must create a new virtual default namespace with empty URI and prefix as current default namespace. This virtual default namespace is regarded as standalone prefix and pushed into stack to be resolved in next stage. 3.3Partial DOM Tree Validation and Merging The main tasks of this stage include checking well-formedness across chunks, resolving standalone prefix and merging partial DOM trees to a complete DOM tree. Each partial DOM tree should be processed one by one as document order, which indicates its a sequential execution path. To avoid being another bottleneck in parallel parser, its important to design a high efficient algorithm for these tasks. A general stack comparing and merging algorithm is presented to meet the requirement. Next three sections explain the algorithm in detail. 3.3.1Check well-form Across Chunks Parallel XML parsing uses two steps to check the well-form of an XML document. First, speculative parsing checks the wellform inside a chunk during parsing, and then this step should Create a across chunks with stack comparing and merging check it new global start element stack A FOR each chunk 3 describes algorithm. Figure from 1 TO N this algorithm, where N is the IF current chunk number of chunks. has been parsed THEN Get the standalone end element stack B IF B is not empty THEN FOR each element in B IF current end element match the top element in A THEN pop one element from A ELSE report ill-form error and break END FOR Get the standalone start element stack C IF C is not empty THEN push each element in C into the stack A END FOR Figure 3. Well-form checking algorithm across chunks This algorithm is simple but efficient. It just maintains one global standalone start element stack, and matches the standalone end stack in each chunk one by one. After the last chunk has been processed, the global standalone start element stack and all of standalone end element stack in each chunk should be empty, which indicates all of standalone start elements have matched all of standalone end elements. If that, the whole XML document is well-formed; otherwise it reports ill-formed error. 3.3.2Resolve standalone prefix In a well-formed XML document, any prefix must bind to a determined namespace which is composed by a pair of prefix and URI. Speculative parsing cant bind a standalone prefix to any namespace because its namespace may be defined in former chunk. These unbounded standalone prefixes can be resolved in this step. By means of the well-form checking algorithm showing in figure 3, its certain that all of ancestors of an element with standalone prefix must have kept in global standalone element stack. Therefore, the standalone prefix resolving algorithm is just looking up all of ancestors from its direct parent to the root element of XML document in global standalone element stack to find the first matched namespace definition. If finds, all of elements with the standalone prefix associate the matched namespace; otherwise it reports unrecognized prefix error. But there is an exception for resolving default namespace. If there is no matched default namespace definition for an empty standalone prefix, it shouldnt report any error. Because XML specification has specified a pre-defined default namespace, its not necessary to define a new default namespace in an XML document. 3.3.3Merge partial DOM trees This step links each partial well-formed DOM tree together to be a complete DOM tree demonstrating the original XML document. The key to merge a partial DOM tree into trunk is finding the right parent for the root node of the partial DOM tree. So a depth concept is introduced to indicate the relevant depth of one element in its own chunk, which is equal to number of standalone end element appearing before the element in current chunk. Besides, N stands for the total chunk number; Take the DOM tree of first chunk as the truck tree FOR each chunk from 2 TO N IF current chunk is well-formed THEN For each DOM trees in current chunk Count the depth D of root element Get the Dth element E from the global start element stack A Set element E as the parent of root node in current DOM tree END FOR ENDFOR Figure 4. Partial DOM Tree Merging algorithm

After the partial validation is done, validation summary and partial element stack is available. After this step, each partial DOM tree should be available to be processed by other application, for example, schema validation can be done based on partial DOM trees, which make it possible to exposes more parallelism in case schema validation is needed in XML parsing. Section 4 focuses on the parallel schema validation algorithm while section 5 demonstrates how to build a unified execution model to integrate parallel schema validation and parallel parsing. 4.2Residual Validation Residual validation merges the partial validation result by chunks in document order. In fact, it just resumes to validating the standalone elements in a chunk which has been suspended due to its child nodes are not available in that chunk. The only difference of residual validation from traditional validation process is that it will use the information kept in validation summary to do the validation work other than retrieving from original XML document or DOM tree. Besides, it updates the partial element stack when has processed one chunk. Figure 6 describes the algorithm. Set partial element stack of first chunk as global element stack FOR two continuous chunks I, II from 1 to N IF the two chunks have been partial validated Set top element of global element stack as context node FOR each element in validation summary of chunk II IF its a start element node THEN validating current node with context node IF its a standalone end element THEN check if the context node is in final state IF global element stack is empty THEN break; ELSE pop one element from global element stack and set top element as current context node ENDFOR Merge partial element stack II into global element stack ENDFOR Figure 6. Residual validation algorithm

4.Parallel Schema Validation


How to improve the overall performance of XML parser and schema validation is a hot topic. Schema validation can introduce even more overhead than that of pure XML parsing. We exploit a parallel solution which can improve the schema validation performance on multi-core. Like parallel parsing, the basic idea of parallel schema validation is divided the whole XML validation task into a set of subtasks, where each subtask is to validate a chunk of document. By means of parallel parsing, each chunk has been parsed as several of partial DOM trees so that scheme validation can work on them directly in parallel. When one sub-validation task completes, it outputs a partial validated element state list and a validation summary. These partial outputs can be merged with each other and outputs the final result. So the parallel validation algorithm can be divided into two phases, which are partial validation based on partial DOM trees and residual validation. The following two sections describe the algorithm in detail. 4.1Partial Validation Based on Partial DOM Trees Partial validation starts schema validation on the parsed data of each chunk in parallel, which can be regarded as a subtask of the whole validation process starting from a specific element in XML that is the root element of each partial DOM tree in a chunk. Usually, a finite state machine is used to describe the semantics of current nodes type. The validation process can be regarded to advance the machine state using child elements names or text value string as input. But the problem is how to determine the root element type for each DOM tree. As described before, partial DOM tree must have merged into the truck before being validated. So the root element type can be determined by searching its ancestors. When completing the validation for all of partial DOM trees in current chunk it generates validation summary information which holds the snapshot of a document fragment after validation and the partial validated information which includes context information for all of standalone start elements in current chunk. They can be used by residual validation to generate the final validation result. The detail algorithm of partial validation algorithm is shown in figure 5. For each DOM tree in current chunk Get root element node in current DOM tree Determined root element type and validate its attribute Validate all of children based on the root element type Push standalone start element into partial element stack Push root element and succeeding standalone end element into validation summary END FOR Figure 5. Partial validation algorithm

5.Task Execution Model


Task execution model explore how to integrate the two parallel algorithms and map parallel tasks to available execution units of multi-core processor to achieve best performance. 5.1Integrating Parallel parsing and schema validation Figure 1 shows the parallel execution model of pure parallel XML parsing, which can be extended to integrate parallel schema validation Parsing Parallel conveniently by pipeline model. The integrated execution model is shown in figure 7.
Chunk Input XML Subtask Generat ion Speculative Parsing ` DOM tree Validation & Merge Chunk Speculative Parsing Speculative Parsing

Chunk

Parallel Validation
` Partial Validation Residual Validation Result

Partial Validation

Partial Validation

Legend
...

Document Chunk Validation Summery Partial validated element state list

Figure 7. Execution model of parallel validated parser

5.2.2Dynamic Mapping Strategy In order to overcome the drawback of static mapping strategy, dynamic mapping strategy is presented. This strategy allocates task to available working thread dynamically at run time. It permits us to schedule execution resource more flexibly. There are two useful facts for the algorithm design after analyzing the pipeline execution model 1) Parallel schema validation depends on parallel parsing, so the execution priority of parsing task should be higher than validation task 2) The time spending on parsing for one chunk is in direct ratio to the size of the chunk. But, execution time of parallel schema validation depends on chunk size and the schema complexity. Its not easy to be evaluated. According to the facts, all of working threads have to do parsing work at the beginning, but as more and more partial DOM tree are generated, some threads should pick up validation task to execute dynamically according to a specific strategy. The ideal execution model should satisfy the following conditions. 1) Load balance among working threads 2) Once a partial DOM tree has been generated, it should be validated as soon as possible 3) One thread should switch once at most between parsing task and schema validation task to improve locality 4) Try to make sure no stall happening for all working thread during task execution In the pipeline model with dynamic mapping strategy, load balance can be achieved easily. Because each working thread can always pick up one task to execute unless all of tasks are finished. Then, the optimal dynamic mapping strategy can be described as how to satisfy the condition four in the situation of condition two and three. Because as more and more parsing threads are switched to do validation task, its possible for a validation thread has to stall and wait for a new validation task, which will decrease the performance a lot. Determining when a parsing thread should switch to do validation work is the key to this algorithm. Lets identify this problem more clearly. In fact, the relationship between parsing and schema validation is a typical producer and consumer model. Producer (parallel XML parsing) produces partial DOM trees and put them into a queue continuously, consumer (parallel schema validation) consumes the generated partial DOM tree from the queue one by one. To avoid stall happening, the algorithm should ensure the queue is not empty at any time, in other words, the consuming speed mustnt exceed the producing speed. Assuming producer takes S1 second to produce one task while consumer takes S2 second to consume one task. Then it must satisfy the following formulas for a producing thread to be a consuming thread.

The pipeline model showed in figure 7 increases two new schema validation phases, which are partial validation and residual validation. All of the five phases would be executed in order as pipelined. Besides, speculative parsing and partial validation can be further parallelized by applying data-parallel. Integrating pipeline and data-parallel model expose more parallelism and has more potential performance gain on multicore processes, but it also brings a challenge issue for task mapping strategy. 5.2Task Mapping Strategy Task mapping focuses on efficient execution resource allocation and scheduling for all of tasks. An excellent task-mapping strategy should ensure load balance and maximize CPU utilization and pipeline execution efficiency to achieve best performance. Usually, its a very difficult research topic. But, some limitations in our model can alleviate the complexity. The pipeline model in figure 7 includes five phases which are subtask generation, speculative parsing, DOM tree validation and merging, partial validation and residual validation. In fact, the first phases can be removed from the pipeline because it executes only once at the beginning. Besides, the expected execution time on the third phase and the last phase should be very short, its reasonable to merge them into speculative parsing phase and partial validation phase respectively, which are called parsing phase and validation phase for simple. So, the problem can be simplified as how to allocate threads/CPU resource for a two phases pipeline model. We present two task mapping strategies. 5.2.1Static Mapping Strategy In this strategy, each working thread always executes some fixed phases in the pipeline. This mapping algorithm is relevant simple and easy to implement. As discussed before, the pipeline only contains two phases which introduces two basic static mapping schemes. The first is always allocating a fixed phase to a thread while the second is allocating two phases to a thread, which means each thread should execute all of tasks in the pipeline in order. Both the two schemes have some advantages and disadvantages. For the first scheme, each working thread always executes one fixed tasks, which is cache friendly at runtime. But it cant ensure load balance. For example, the threads working on validation phase cant start work until parallel parsing has generated one partial DOM tree at least. The second static allocation scheme has good load balance characteristics, but the weakness is that the thread execution context has to switch between parsing phase to validation phase from time to time, which results in lower locality and higher cache miss.

( N 1 1) / S1 > ( N 2 + 1) / S 2
R>0 ( 2)

(1)

(Where N1 is the current number producing thread, N2 is the current number of consuming thread and R is the number of task in the queue). To apply this algorithm, the parsing time (S1) and validation time (S2) on one chunk need be known in advance. Parsing time can be estimated by chunk size directly, but its not easy to evaluate validation time. In fact, validation time for one chunk

can be estimated roughly by averaging validating time of first one or two chunks. The whole dynamic allocation algorithm can be described as follows. Where parsing thread means it always executes parsing task while validation thread means it always executes validation task. 1) All of working threads are allocated parsing task at the beginning 2) The first thread who finish parsing task switches to be validation thread and takes the responsibility to estimate execution time of validation task according to the running history 3) When the number of parsing thread is larger than 1, each parsing thread evaluates whether it should switch to be a validation thread according to formula (1) and (2) whenever completing one parsing task. 4) If there is only one parsing thread left, it should complete all of parsing tasks, then switch to be a validation thread 5) All threads are retired or exited when all of parsing and validation tasks are completed

Figure 8. Speedup of pure parallel parser on XML document with different size From the figure 8, we can see that the bigger the XML document size, the higher speedup of parallel parser can achieve. Because bigger XML document can be split into more subtasks to be parsed in parallel and can maximize the utilization of multi-processors. This algorithm has very good speedup for large XML files. It is even nearly 3 times faster than sequential parsing with two threads. The speedup for large file is also nearly 4.5 with eight threads. But the overhead of thread communication and lock contention is becoming significant when the total sequential parsing time is very short for small XML file. Thats the reason why there is no performance gain for small XML file less than 64K. But parallel parser can filter these cases at run time by setting a threshold of minimum XML document size. It still makes sense because parallel parser is designed for speeding up large XML parsing. Besides, we made a performance comparison analysis against an alternative parallel parser prototype presented in paper [6]. Figure 9 gives the performance comparison data for the two parallel parsers on large XML document. The result shows the performance of speculative parsing based parser presented in this parser is much better than that of alternative one, especially for eight threads. The bottleneck of the alternative parallel parsing algorithm is the pre-scanning process which is much costly as a sequential execution path though its much faster than factual XML parsing.
Parallel Parsing Algorithm Comparsion
5 4.5 4 3.5 Speedup 3 2.5 2 1.5 1 0.5 0 2 threads 4 threads 8 threads Pre-Scanning Based Specutive-Parsing Based

6.Performance Evaluation
This section gives the performance evaluation result. We first measure performance of pure parallel parser, and make a comparison with an existed parallel parser to show our parser has higher performance and better scalability. Then we will show the great performance improvement on schema validated parser when integrating parallel schema validation algorithm. These initial performance measurements were taken on a Dual Intel Xeon 5300 processor machine (2.66GHz, Quad Core, Shared 8M L2 cache) with 4G RAM. The underlying operation system for these tests is Redhat Linux EL4 (kernel 2.6.9). We had exclusive access to the machine during the test to minimize external system effects on our results. Every test had run ten times to get the average time and the measurement of the first time is discard, so as to measure performance with the XML file data already cached rather than being read from disk. 6.1Pure Parallel Parser Performance Pure parallel parser only parses the XML file in parallel and generate DOM tree. Its the basic model of parallel parser. We care about the speedup which measure how well the parallel algorithm scales, and is important for evaluating the efficiency of parallel algorithms. It is calculated by dividing sequential time by the parallel time. For our experiments, the sequential time refers to the time needed by our basic sax parser Lexer2 to parse the whole XML document and generate DOM tree. The parallel time can be counted by measure how much our parallel XML parser will spend to build the same DOM tree. Figure 8 shows how the parallel algorithm scales with the number of threads when parsing the XML document with different document size. File(>1M) Large linear speedup
Middle File(64K-1M) 9.00 8.00 7.00 6.00 Speedup 5.00 4.00 3.00 2.00 1.00 0.00 1 thread 2 threads 4 threads 8 threads Small File(<64K)

Speed up

Figure 9. Performance comparison between prescanning based algorithm and speculative parsing based algorithm But, speculative parsing based parallel XML parsing algorithm avoids the pre-scanning process. It decreases the sequential execution time as much as possible. An initial evaluation result 6 shows that the average overhead of pre-scanning is hundreds times slower than the overhead introduced by the sequential 5 execution time of our parallel algorithm. Thats the main reason why our4algorithm performs better than the pre-scanning based Pure Parallel Parsing parallel parsing algorithm.
3 2 1 0 2 threads 4 threads 8 threads Parallel Validated DOM Parser Standalone Parallel Validation

Figure 10. Speedup of three parallel parsers

6.2Parallel validated parser performance Parallel validated parser integrates parallel parsing and parallel schema validation. It improves the overall performance of schema validated parser greatly. Schema validated parser has two usual usage scenarios. The first is standalone parallel validation which only validates the XML document and gives the validation result against a given schema; the other is validating XML document but generating DOM tree at the same time, which is called parallel validated DOM parser. Figure10 gives the performance data for pure parallel parsing, standalone parallel validation and parallel validated DOM parser respectively on large XML files. In this test, parallel scheme validation has adopted dynamic allocation algorithm which has better performance than static allocation algorithm. From figure 10, we can see the speedup of standalone parallel validation performs better than that of pure parallel parser because the former can generate more parallel tasks which increase the parallelism. While parallel validated DOM parser has best performance because parallel validated parser has generated DOM tree inherently. But the single thread validated DOM parser has to do some extra work to generate DOM tree.

time during parallel parsing by applying speculative parsing algorithm. The key to the algorithm is parsing a chunk of XML data speculatively and caching some ill-formed exception to be checked at partial DOM tree validation stage. This solution exposes more parallelism and decrease the dependence of each working thread. Moreover, we have developed a novel parallel scheme validation algorithm which can validate partial DOM trees in parallel. This algorithm can be integrated with parallel parsing to build a parallel validated parser. We can see the great overall performance gain by parallelizing XML parsing as well as scheme validation. It gives a perfect parallel solution to address the performance issue which is introduced by the extremely low efficiency when processing XML document in a typical XML based application.

References
[1] G. Singh, S. Bharathi, A. Chervenak, E. Deelman, C. Kesselman, M. Manohar, S. Patil, and L. Pearlman. A Metadata Catalog Service for Data Intensive Applications In SC 03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 33, [2] eBay. eBay Developers Program. //developer.ebay.com/developercenter/soap/. http:

[3] M. R. Head, M. Govindaraju, A. Slominski, P. Liu, N. Abu-Ghazaleh, R. van Engelen, K. Chiu, and M. J. Lewis. A Benchmark Suite for SOAP-based Communication in Grid Web Services. In SC 05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 19, Washington, DC, USA, 2005. IEEE Computer Society. [4] M. R. Head, M. Govindaraju, R. van Engelen, and W. Zhang. Benchmarking XML Processors for Applications in Grid Web Services. In SC 06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 121, New York, NY, USA, 2006. ACM Press. Nicola, M. and John, J., "XML Parsing: a Threat to Database Performance" International Conference on Information and Knowledge Management, 2003, pp. 175-178.

[5]

7.Conclusion
In this parser, we have described a new parallel XML parsing algorithm which has great performance advantage, and scale well for up to eight cores. The mainly performance gain is coming from aggressively eliminating the sequential execution

[6] W. Lu, K. Chiu, and Y. Pan A parallel approach to XML parsing. In The 7th IEEE/ACM International Conference on Grid Computing, Barcelona, September 2006. [7] Michael R. Head and Madhusudhan Govindaraju Approaching a Parallelized XML Parser Optimized for Multi-Core Processor

Figure 10. Speedup for three parallel parsers

You might also like