We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18
lexical analysis in compiler design
SUBMITTED TO :- Tanvi Mehta
SUBMITTED BY:- DHRITI 2021BCA055 introduction to lexical analysis
Lexical Analysis is the first phase of the compiler
also known as a scanner. It converts the High level input program into a sequence of Tokens. introduction to compiler design Compiler design is a crucial aspect of computer science and software engineering that focuses on the development of software tools called compilers. A compiler is a program that translates source code written in a high-level programming language into equivalent machine code or another form of code that can be executed by a computer's hardware. Compiler design requires a deep understanding of programming languages, computer architecture, algorithms, and data structures. It also involves a balance between theoretical concepts and practical considerations to produce efficient and reliable compilers process OVERVIEW OF COMPILATION PROCESS
The following are the phases through which
our program passes before being transformed into an executable form: Preprocessor Compiler Assembler Linker order of compilation phases DETAILED EXPLAINATION OF EACH PHASE Preprocessor:- The source code is the code which is written in a text editor and the source code file is given an extension ".c". This source code is first passed to the preprocessor, and then the preprocessor expands this code. Compiler:- The code which is expanded by the preprocessor is passed to the compiler. The compiler converts this code into assembly code. Or we can say that the C compiler converts the pre- processed code into assembly code. Assembler:-The assembly code is converted into object code by using an assembler. The name of the object file generated by the assembler is the same as the source file. The extension of the object file in DOS is '.obj,' and in UNIX, the extension is 'o'. Linker:- Mainly, all the programs written in C use library functions. These library functions are pre-compiled, and the object PURPOSE OF LEXICAL ANALYSIS If the lexical analyzer is located as a separate pass in the compiler it can need an intermediate file to locate its output, from which the parser would then takes its input. It can eliminate the need for the intermediate file The lexical analyzer also interacts with the symbol table while passing tokens to the parser. Whenever a token is discovered, the lexical analyzer returns a representation for that token to the parser TOKENIZATION
Tokenization, when applied to data security, is the process
of substituting a sensitive data element with a non- sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization TOKEN CATEGORIES
This phase recognizes three types of tokens: -
Terminal Symbols (TRM)- Keywords and Operators, Literals (LIT), and Identifiers (IDN) Example 1: int a = 10; //Input Source code Tokens int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation-semicolon) Answer – Total number of tokens = 5 • Example 2: int main() { // printf() sends the string inside quotation to // the standard output (the display) printf("Welcome to GeeksforGeeks!"); return 0; }Tokens 'int', 'main', '(', ')', '{', 'printf', '(', ' "Welcome to GeeksforGeeks!" ', ')', ';', 'return', '0', ';', '}' Answer – Total number of tokens = 14 LEXICAL ERRORS
1. Exceeding length of identifier or numeric
constants:- Example:- #include <iostream> using namespace std; int main() { int a=2147483647 +1; return 0; } This is a lexical error since signed integer lies between −2,147,483,648 and 2,147,483,647. 2. Appearance of illegal characters:- #include <iostream> using namespace std; int main() { printf("Geeksforgeeks");$ return 0; } This is a lexical error since an illegal character $ appears at the end of the statement. • 12.Unmatched string:- #include <iostream> using namespace std; int main() { /* comment cout<<"GFG!"; return 0;i } This is a lexical error since the ending of comment “*/” is not present but the beginning is present ADVANTAGES OF LEXICAL ANALYSIS 1.Tokenization: Lexical analysis breaks down the input source code into tokens, which are the smallest meaningful units of the programming language 2.Error Detection: Lexical analysis includes mechanisms for detecting and reporting lexical errors, such as illegal characters or tokens that do not conform to the language syntax. 3.Language Independence: Lexical analysis can be designed to support multiple programming languages. DISADVANTAGES OF LEXICAL ANALYSIS 1.Complexity: Implementing a robust lexical analyzer can be complex, especially for languages with intricate lexical rules or irregular syntax. 2.Performance Overhead: Lexical analysis adds an additional processing step to the compilation process, which can introduce some performance overhead. 3.Memory Consumption: A lexical analyzer typically needs to maintain data structures such as symbol tables and token buffers, which can consume memory, 4.Error Recovery: While lexical analysis can detect lexical errors such as invalid characters or tokens, error recovery mechanisms may be limited. APPLICATIONS OF LEXICAL ANALYSIS 1.Compiler Design:- In compiler design, lexical analysis is the first phase of the compilation process. 2.Interpreter Design:- Similar to compilers, interpreters also use lexical analysis to break down the source code into tokens before executing it. 3.Text Processing: Lexical analysis is used in various text processing applications, such as text editors, search engines, and lexical analyzers for natural language. 4.Compiler Front-End Tools: Lexical analysis tools such as Lex and Flex are widely used to generate lexical analyzers automatically from lexical specifications. references • .https://www.geeksforgeeks.org/lexicalerror • ./https://www.javatpoint.com/lexical-error • .https://www.geeksforgeeks.org/error-detection- recovery-compiler/ • .https://www.geeksforgeeks.org/token-patterns- and-lexems/ • .https://www.tutorialspoint.com/compiler_design/ compiler_design_lexical_analysis.htm • .https://www.guru99.com/c-tokens-keywords- identifier.html