Barve 2014

2014 International Conference on Parallel, Distributed and Grid Computing
3DUDOOHO6\QWD[$QDO\VLVRQ0XOWL&RUH0DFKLQHV
Amit Barve Brijendra Kumar Joshi
Asst. Professor, CSE Professor
VIIT, Pune (India) MCTE Mhow (India)
[email protected] om
[email protected]
ABSTRACT -A multi-core machine has m more than one operators in that order. 20 is lex
xeme that is mapped to the
execution unit per CPU on single motherboard. With the advent token const. The syntax analyzerr takes these tokens as input
of multi-core machines parallelization has becom me an essential and produces a parse tree structurre shown in Figure 1.
part in recent compiler research. Parallel parsinng is one of the
areas that still needs significant work to utilizze the inherent
power of multi-core architecture. This papeer presents an
algorithm that performs parallel syntax analysis of C programs
on multi-core architecture. Reasonable speed-u up up to 6 was
achieved on syntax analysis of C files of GCC 4.8.3.
Keywords: Parallel Syntax Analysis, Flex, Biison, Processor

Affinity, Multi-Core Architecture.
I. INTRODUCTION
A compiler is a program that reeads a source
program in one language and translates it intoo an equivalent Figure 1: Parse tree for exp
pression a= b+c *20
program in another language. The process off compilation is
divided into various phases. The very first phase is known The detailed description of all th
he phases of a compiler can
as Lexical Analysis or scanning and the pprogram which be found in popular texts [1][2][3][4].
performs this task is called lexical analyzer or scanner or
laxer. The lexical analyzer takes stream off characters as II. CLASSICAL WAY
W OF PARSING
input and groups them into meaningful seqquences called
lexemes. For each lexeme, lexical analyzeer generates a The main aim of the syntax analysis phase is to
token which is consumed by subsequent phaase i.e. syntax take the tokens produced by thee lexical analyzer and use
analysis. Syntax analysis phase also known ass parsing takes some parsing algorithm to verify y that the stream of tokens
as input a stream of tokens to create a tree-likke intermediate represents a legal string in th he language. The parsing
representation also known as Syntax Tree.. Syntax trees algorithms are mainly classifiedd into two categories, top-
depict the grammatical structure of the tokeen stream. For down parsing and bottom-up parsing. These refer to the
example, if an expression is written as order in which nodes in the parrse tree are constructed. In
top-down approach the constructtion of the tree starts from
root and proceeds towards the leaves
l while in bottom up
a= b+c *20 (1) approach parse tree begins with leeaves and proceeds towards
the root. Some popular top-dow wn parsing algorithms are
then the lexical analyzer output for the givven expression recursive decent parsing (also callled predictive parsing) and
would be non-recursive decent parsing. Bottom-up
B parsing includes
some algorithms like Simple LR R parser (SLR), Canonical
id1 = id2 + id3 * const (2) LR Parser (CLR), and Look Aheaad LR (LALR) parsing.
Here “a” is the lexeme that is mapped to a tooken id1 where In LR parsing parser reaads input from left to right
id is a symbol used to represent an identifieer and 1 is an and produces a right most deriv vation in reverse. The term
index into the symbol table entry for a. Similaarly b and c are LR(k) parser is also used, wheree k refers to the number of
mapped to tokens id2, id3 respectively. The =, +, * lexemes unconsumed look ahead input symbols that are used in
are mapped to tokens =, +, * respectively, since they are making parsing decisions. Depeending on how the parsing
abstract symbols for assignment, addition andd multiplication table is generated, an LR parser can be called SLR, LALR,
978-1-4799-7683-6/14/$31.00©2014 IEEE 209

or CLR Parser. LALR parsers have more language and Joshi[18][19][20] developed some algorithms for doing
recognition power than SLR parsers. Canonical LR parsers parallel lexical analysis on multi-core machines. Their
have more recognition power than LALR parsers. approach is to divide the source code into number of blocks
and perform lexical analysis on individual blocks. Their
Some researchers further explored these parsing approach was good for parallel lexical analysis. In this paper
algorithms. Cohen and Roth [5] described an approach for the concept is extended to parallel syntax analysis.
determining the time taken to parse sentences accepted by a
deterministic parser. They considered recursive decent V. PRACTICAL PARALLEL SYNTAX
parser and a bottom up parser (SLR parser) for the analysis. ANALYSIS ALGORITHM
Gerardy [6] describes the experimental comparison of
parsing methods. He explored recursive decent parsing, SLR For doing syntax analysis in parallel for a large
and operator precedence parsing methods. T. Anderson et al software of multiple files first we need to select the folder
proposed an Efficient LR(1) Parser [7]. which has all required files. Since our aim is to do only
syntax analysis, individual files can be analyzed
III. SYNTAX ANALYZER GENERATOR syntactically independent of one other. After selection of
folder we need to do syntax analysis of files present in the
Syntax Analyzer generator also known as parser generator
folder in parallel. Parallel syntax analysis can be done by
takes a grammar specified by the programmer and
selecting the file and scheduling it to a specific processor for
generates a syntax analyzer that recognizes valid
syntax analysis. The steps for parallel syntax analysis are
“sentences” in that grammar. Johnson [8] is considered to
given in figure 2.
be the first one to take up the challenge of generating syntax
analyzers from specifications in the form of a grammar and
the tool developed is by far the most important in compiler Algorithm : Parallel Syntax Analysis
development. Now a days Bison [9] is used as parser
generator. This tool is open source and is freely available Input: C File, Processor Number
under Linux and its variants. It generates a parser that 1. Select the source folder.
recognizes LALR(1) grammar. It generates a parser that
recognizes LALR(1) grammar. 2. Scan the source folder. While scanning, write
the following information in a file say file.txt
IV. PARALLEL PARSING
(a) Path of individual files
Parallel parsing has been attempted by many in the
(b) Size of the files.
past. Lincoln [10] first proposed the concept of parallel
object code for FORTRAN and COBOL job cards in an 3. Open the file.txt in read only mode.
environment that consisted of IBM 704 uniprocessors and
CDC 6500 of ILLIAC IV. The parallel processing was 4. For each line written in file.txt do the
achieved by assigning completely different user jobs to following in parallel.
different processors. Zosel[11] focused on recognizing
FORTRAN DO-loops that can be collapsed into vector (a) Select the file from the folder.
instructions for CDC 7600 machines. For the first time, (b) Perform syntax analysis on selected
Mickunas and Shell[12] recognized the areas in a
compilation process where the parallel processing is file by assigning processor affinity.
inherent. They proposed to split lexical analysis into Figure 2: Parallel Syntax Analysis
scanning and screening. They also developed a parallel
parsing method based on LR parsing. Hickey and
Katcoff[13] have analyzed parsing algorithms for upper The above algorithm of fig. 2 was implemented in
bound on speedup whereas Cohen and Kolodner[14] have C for parallel syntax analysis of GCC 4.8.3. The C code is
estimated speedup in parallel parsing. given in fig.3(a), fig.3(b) for lack of space only main
functions of the code are given. Calculation of time taken in
Object Oriented parsing was proposed by parsing of individual files is done separately.
Yonezmva and Oshava[15]. Khanna et al[16] proposed the
partitioning of grammar to make it suitable for parallel
compilation. Chandwani et al[17] developed a parallel
algorithm for CKY-parsing for context free grammars. The
effort cited in reference [12]-[17] to develop parallel parsing
algorithms are of theoretical importance only. Their
practical implementations have not been seen so far in real
programming languages for multi-cores machines. Barve
210
1. int main(int argc, char *argv[]) VI. BINDING PROCESSES ON MULTI-CORE

2. {
3. FILE *fp; MACHINES
4. fp = fopen("file.txt", "r");
5. assert(fp); The binding of any process to any processor can be
6. starterline = malloc(10000); done in Linux through setaffinity() function[21][22]. taskset
7. while (fscanf(fp, "%s", starterline)
!= EOF)
command can be used to load a program from permanent
8. { storage and bind it to a specific processor. These two
9. Extract(starterline, cpuno); features can be used to schedule any program/process to any
10. if(cpuno==cpumax) of the available processors. Line number 3 in Extraction
11. {
12. cpuno=2; function of figure 3 shows the processor affinity using
13. cpuno--; taskset command.
14. }
15. cpuno++; VII. SPEED-UP
16. }
17. fclose(fp);
18. return 0; The main goal of parallel methodologies in
19. }
programming is that the parallel programs execute faster as
compare to sequential ones. The ratio of sequential and
Figure 3(a): C implementation of Parallel Syntax Analysis parallel execution time can be represented as Speedup which
1. int Extract(char *StartLineNo,int can be expressed as:
cpuno)
2. {
3. char *run="taskset -c";
ܵ݁‫݁݉݅ݐ݊݋݅ݐݑܿ݁ݔ݈݁ܽ݅ݐ݊݁ݑݍ‬
4. char *cpuno1; ܵ‫ ݌ݑ݀݁݁݌‬ൌ ሺ͵ሻ
5. sprintf(text1, "%d",cpuno); ܲܽ‫݁݉݅ݐ݊݋݅ݐݑܿ݁ݔ݈݈݈݁݁ܽݎ‬
6. cpuno1=text1;
7. char *scan="./scan";
8. char *p, text[10000];
9. char *s=" ";
10. sprintf(text,"%s",starterline); The operations performed by a parallel algorithm can be put
11. p = text; into three categories [23]:
12. char *c = malloc(strlen(run) +
strlen(p) + strlen(s)+ strlen(s)+ (a) Operations that must be performed sequentially.
strlen(s)+ strlen(cpuno1)+
strlen(scan)+ 1);
(b) Operations that can be performed in parallel.
13. strcpy(c, run); (c) Operations requiring communication among
14. strcat(c, s); processors.
15. strcat(c,cpuno1);
16. strcat(c, s);
17. strcat(c,scan);
In this paper, sequential operations refer to sequential
18. strcat(c,s); syntax analysis of all files on single processor whereas
19. strcat(c, p); operations in parallel refer to sequential syntax analysis of
20. system(c); files distributed among available processors. Since syntax
21. return 0;
22. }
analyses of files are independent from one another the
communication overhead is almost negligible. Though
Figure 3(b): C implementation of Parallel Syntax Analysis
communication overhead is present between master
processor distributing tasks and processors executing these
The Fig.3 (a) shows the body of main function of tasks. It is present only when tasks are distributed and when
the program in which line numbers 3-5 show opening a file they finish. It is assumed that the communication overhead
named as file.txt which contains the information of all C file is zero as compared to actual syntax analyses.
present in GCC 4.8.3. Line numbers 7-16 show a while loop
VII. EXPERIMENTAL RESULTS
in which Extract function is called which takes the line
number and CPU number as arguments. The line number is
taken from file.txt. The experiment based on the above algorithm was
carried out on Ubuntu 12.04 LTS on Wipro Netpower server
In the Fig.3 (b) the Extract function prepares a with Intel Xeon E5606 base duel CPU Quad Core machine
string which consists of processor affinity command (line with 8MB on chip cache and 24 GB RAM and processor
number 3); CPU number (line number 4-6), file name and speed 2.13 GHz having 8 cores in total. For testing and
scan (line number 7), where Scan is the ANSI C syntax appreciable results a huge software is required therefore we
analyzer generated using Bison. Final string is then executed explored GCC 4.8.3 software package. We have considered
by using system call (line number 12-20). only C files. The total of 20,642 C files are present in GCC
4.8.3. Minimum file size is 0 byte (umips-lwp-1.c) and
maximum is 6.4 MB (bid_binarydecimal.c). To get accurate
211
results init process whose pid is 1, was bound to CPU 0 TABLE 3. SYNTAX ANALYSIS OF GCC WITH DISTRIBUTION OF
FILES IN RANDOM ORDER OF SIZE
using setaffinity() and remaining CPUs were exclusively
No. Of CPUs Time taken in (Seconds) Speedup
used for parallel syntax analysis. 1 32.14 1
2 16.91 1.90
For the experiment we have considered ANSI C 3 11.51 2.79
grammar specifications to generate both lexical and syntax 4 8.79 3.65
analyzers. The lexical analyzer is generated using flex [24] 5 7.21 4.45
6 6.09 5.27
and syntax analyzer was generated using bison. Tables 1-3 7 5.44 5.90
show average time taken in syntax analysis of all C files of
GCC. In these tables, time taken by 1 CPU is nothing but
sequential analysis. The speedup is calculated by using
equation (3). For example, in Table 1, for 2 CPUs, the
speedup is as given below:
ܶ݅݉݁‫݊݅݊݁݇ܽݐ‬
‫ݏ݅ݏݕܽ݊ܽݔܽݐ݊ݕݏ݈ܽ݅ݐ݊݁ݑݍݏ‬
ܵ‫ ݌ݑ݀݁݁݌‬ൌ
ܶ݅݉݁‫݈݈݈݁ܽݎܽ݌݊݅݊݁݇ܽݐ‬
‫ݏ݅ݏݕ݈ܽ݊ܽݔܽݐ݊ݕݏ‬
ܾ‫݈݈݈݁ܽݎܽ݌݊݅ݏܷܲܥʹݕ‬
ܶ݅݉݁‫݈ܽ݅ݐ݊݁ݑݍݏ݊݅݊݁݇ܽݐ‬
‫ܷܲܥͳݕܾݏ݅ݏݕܽ݊ܽݔܽݐ݊ݕݏ‬
ൌ
ܶ݅݉݁‫݈݈݈݁ܽݎܽ݌݊݅݊݁݇ܽݐ‬
Fig. 4. Speed Up in Syntax Analysis of GCC 4.8.3
‫ݏ݅ݏݕ݈ܽ݊ܽݔܽݐ݊ݕݏ‬
ܾ‫݈݈݈݁ܽݎܽ݌݊݅ݏܷܲܥʹݕ‬
IX. CONCLUSION
ൌ ͵ͳǤͲ͵ ൌ ͳǤͻͲሺͶሻ
ͳ͸Ǥʹͺ Parallel syntax analysis of multiple C source files
Similarly speedup was computed for more number was presented in this paper. It was assumed that lexical
of processor.
analysis of individual files that were scheduled on a
The distribution of C files of GCC 4.8.3 is done in processor for syntax analysis was done on the same
ascending order, descending order and random order based
on file sizes. Fig. 4 shows the comparison in the speedup of processor. The speed up obtained for 7 CPUs was 6.31
all three file distribution techniques. It is clear from which is quite reasonable. The speedup would further
observations that significant amount of time can be saved by
the use of this approach for a large software package. increase with more number of processors. Though this
increment in speedup would be decreasing as the time
TABLE 1. SYNTAX ANALYSIS OF GCC WITH DISTRIBUTION OF
FILES IN ASCENDING ORDER OF SIZE. devoted to distribute files would increase.
No. Of CPUs Time taken in (Seconds) Speedup
1 31.03 1
2 16.28 1.90 References:
3 10.9 2.84
4 8.57 3.62
5 6.76 4.59 [1]. Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman; “Principles of
6 5.66 5.48 Compiler Design”; Addison Wesley Publication Company,
7 4.98 6.23
USA, 1985.
TABLE 2. SYNTAX ANALYSIS OF GCC WITH DISTRIBUTION OF [2]. Alfred V. Aho, Ravi Sethi, Jeffrey D.Ullman; “Compilers:
FILES IN DESCENDING ORDER OF SIZE.
No. Of CPUs Time taken in (Seconds) Speedup Principles, Techniques and Tools”; Addison Wesley Publication
1 32.5 1 Company, USA, 1986.
2 16.72 1.94
3 10.91 2.97 [3]. Jean Paul Tremblay,Paul G. Sorenson;”The Theory and Practice
4 8.36 3.88 of Compiler Writing”;McGraw-Hill Book Company USA 1985
5 7.28 4.46
[4]. David Gries; “Compiler Construction for digital Computers”;
6 5.86 5.54
7 5.15 6.31 John Wiley & Sons Inc. USA, 1971.
212
[5]. J. Cohen, M.S. Roth;”Analysis of Deterministic Parsing [20]. Amit Barve and Brijendra kumar Joshi; “Parallel lexical
Algorithms”; Communication of ACM Vol. 21, No. 6, pp.448- analysis of multiple files on multi-core machines”; International
458; June 1978. Journal of Computer Applications; Vol. 96, No.8, June 2014.
[6]. R. Gerardy; “Experimental Comparison of Some Parsing [21]. http://www.linuxjournal.com/article/6799?page=0,1.
Methods”; ACM SIGPLAN Notices Vol. 22 Issue 8, pp. 79 – [22]. http://www.cyberciti.biz/tips/setting-processor-affinity-certain-
88; August 1, 1987. task-or-process.html (Last accessed on 05-Aug-2014)
[7]. T. Anderson, J. Eve, J.J. Horning; “Efficient LR(1) Parsers”; [23]. Michael J. Quinn;”Paralle Programming in C with MPI and
Acta Informatica Vol. 2, Issue 1 , pp 12-39 1973. OpenMP”;pp.159-160.Tata McGraw-Hill Publication, New
[8]. S. C. Johnson; “YACC: Yet Another Compiler Compiler”; Delhi 2003.
Computing Science Technical Report no 32, Bell Laboratories, [24]. http://flex.sourceforge.net/
Murray Hills, New Jersey, 1975.
[9]. www.gnu.org/s/bison.
[10]. N. Lincoln; “Parallel Compiling Techniques for Compilers”;
ACM Sigplan Notices, 10(1970), pp. 18-31, 1970.
[11]. M. Zosel; “A Parallel Approach to Compilation”; Conf. REc.
ACM Sysposium on Principles of Programming Languages,
Boston, MA, pp. 59-70, October 1973.
[12]. M. D. Mickunas, R. M. Schell; “Parallel Compilation in a
Multiprocessor Environment”; Proceedings of the annual
conference of the ACM, Washington, D.C., USA, pp. 241–246,
1978.
[13]. Timothy Hickey, Joel Katcoff; “Upper Bounds for Speedup in
Parallel Parsing”; Journal of the ACM (JACM), Vol. 29, No. 2,
pp. 408 – 428, 1982.
[14]. J. Cohen, Stuart Kolodner; “Estimating the Speed up in Parallel
Parsing”; IEEE Transactions on Software Engineering, January
1985.
[15]. Akinori Yonezmva, Ichiro Ohsawa; “Object-Oriented Parallel
Parsing for Context-Free Grammars”; Proceedings of the 12th
conference on Computational linguistics – Vol. 2, Budapest,
Hungry, pp. 773–778, 1988.
[16]. Sanjay Khanna, ArifGhafoor, AmritGoel; “A Parallel
Compilation Technique Based on Grammar Partitioning”;
Proceedings of ACM annual conference on Cooperation,
Washington, D.C., USA, pp. 385 – 391, 1990.
[17]. M. Chandwani, M. Puranik , N.S. Chaudhari, “On CKY-
Parsing of Context Free Grammars in Parallel”; Proceedings of
the IEEE Region 10 Conference, Tencon 92, Melbourne
Australia, pp. 141-145, 1992.
[18]. Amit Barve and Dr. Brijendra Kumar Joshi;”A Parallel Lexical
Analyzer for Multi-core Machine”; Proceeding of CONSEG-
2012,CSI 6th International confernece on software engineering;
pp 319-323;5-7 September 2012 Indore,India.
[19]. Amit Barve and Brijendrakumar Joshi, "Parallel lexical analysis
on multi-core machines using divide and conquer," NUiCONE-
2012 Nirma University International Conference on
Engineering , pp.1,5, 6-8 Dec. 2012. Ahmedabad, India.
213

Barve 2014

Uploaded by

Copyright:

Available Formats

Barve 2014

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Barve 2014

Uploaded by

Copyright:

Available Formats

2014 International Conference on Parallel, Distributed and Grid Computing

Keywords: Parallel Syntax Analysis, Flex, Biison, Processor

978-1-4799-7683-6/14/$31.00©2014 IEEE 209

1. int main(int argc, char *argv[]) VI. BINDING PROCESSES ON MULTI-CORE

You might also like