How programming language
works
Thet Khine
Approaches to Language Implementation
Compiler
Interpreter
Virtual Machine
Transpiler
JIT/AOT Compiler
Compiler
Transform High level programming language into low level
intermediate representation or native executable.
Mainly used for compiling static type programming language such
as C/C++.
Pros
Execution speed is the best among implementation
Cons
Not suitable for building dynamic language.
Produce Platform dependent code.
Interpreter
Mainly used for dynamic programming language such as JavaScript,
Python, PHP, Ruby, Lua.
Modern dynamic programming language does not directly execute
programming statement.
Each programming statement is transformed by front end
compilation process into a series of bytecode and executed by byte
code interpreter or virtual machine.
Slow when compare to compiler.
Virtual Machine
Hybrid approach to language implementation.
C# and Java follow this approach.
Instead of generating platform dependent native code, they just
generate bytecode for virtual machine.
Bytecode can be run on Virtual Machine implemented in software or
hardware.
By using Virtual Machine based approach, building platform
independent language is more easier.
Transpiler
Known as source to source compiler.
Used to implement modern JavaScript version such as ES6, ES7 etc.
ES6 or ES7 version of JavaScript is compile down to ES5 which
browser can be directly execute.
Example Babel Transpiler
JIT/AOT Compiler
Just In Time compilation compile the bytecode into native machine
code when the program is running.
Native code can be run faster than bytecode.
But compiling bytecode to native code also need runtime
performance.
So, before execution of the program, all bytecode are compiled to
native code that is called AOT (Ahead of time compilation)
Compilation Process
Image Source: via Google
Compilation Process (Front End)
Front end concerned with checking the grammar of the program.
Static type analysis perform checking such as language type rules.
If the program are syntactically correct, parse tree can be generated.
Based on the parse tree backend generate the bytecode or target
code.
Target code mean bytecode or native machine code.
Compilation Process (Back End)
Back end concerned with producing of native machine code or bytecode.
For the compiled language, bytecode or native code is produced when the
program is compiled. Eg Java/C#/C
For compile language generating machine code, machine code is not
typically generated by compiler, but assembly is generated first and then
assembly is assembled to machine code. Eg most of the C compiler does
this approach
For the interpreted language, the program is compiled into bytecode
when the program is loaded, bytecode in interpreter can be cached.
Lexical Analysis Syntax Analysis, Semantic
Analysis
Lexical Analysis split the program stream into a set of token eg
IDENTIFIER, OPERATOR so that they can be used by syntax
analysis.
Syntax analysis is done by parser to check the grammar or syntax of
the program.
Syntax analysis produce parse tree which is later pruned to produce
AST(Abstract Syntax Tree)
Semantic Analysis is done to check a program conform to type rule
and language rule which cannot be detected by syntax analyzer.
Abstract Syntax Tree
a = b + c* d;
+
a
b *
d
c
Code Generator
Code generator use the AST to generate the bytecode or native
machine code for the program.
These intermediate code can be now optimized to generate better
code.
Code generator traverse the AST into lrR order to generate the stack
based machine code.
Tree traversing order can be differ based on the type of target code.
Generate Byte Code
a = b + c* d; Push b
Push c
= Push d
Mult
+
a Add
Store A
b *
d
c
Type of Virtual Machine
Stack based Virtual Machine
Used by JVM, CLR (Common Language Runtime)
Used operand stack to evaluate the operation and statement for all bytecode.
Easy to implement with switch based interpreter.
Register based Virtual Machine
Used register based approach to virtual machine.
Employed by Chrome V8 engine and Lua interpreter.
Easy to generate machine code due to the register based nature.
Because physical CPU use register.
Virtual Machine
Virtual Machine use abstract machine code (bytecode in case of
Java and MSIL for CLR)
These abstract machine code cannot be directly executed by
hardware machine, so emulation over hardware is need.
These emulation is mostly done by software known as virtual
machine but can also be done in hardware based virtual machine.
Virtual machine make programming language to be platform
independent.
Virtual Machine
Abstract Byte Code
VM for window VM for Linux
Window OS Linux OS
Stack Based VM
a 17
a = b + c* d; (b =2, c=3, d=5)
Push b
Push c
Push d
Mult 5
Add 3
15
172
Store A
JVM: Runtime Data Areas
19
Besides OO concepts, JVM also supports multi-threading. Threads are directly
supported by the JVM.
=> Two kinds of runtime data areas:
1. shared between all threads
2. private to a single thread
Shared Thread 1 Thread 2
Garbage Collected pc pc
Heap
Native Native
Java Java
Method Method
Method area Stack Stack
Stack Stack
JVM Interpreter
20
The core of a JVM interpreter is basically this:
do {
byte opcode = fetch an opcode;
switch (opcode) {
case opCode1 :
fetch operands for opCode1;
execute action for opCode1;
break;
case opCode2 :
fetch operands for opCode2;
execute action for opCode2;
break;
case ...
} while (more to do)
Java Stacks
21
JVM is a stack based machine.
JVM instructions
• implicitly take arguments from the stack top
• put their result on the top of the stack
The stack is used to
• pass arguments to methods
• return a result from a method
• store intermediate results while evaluating expressions
• store local variables
Stack Frames
22 The role/purpose of each of the areas in a stack frame:
pointer to Used implicitly when executing JVM
constant pool instructions that contain entries into the
constant pool (more about this later).
args
+
Space where the arguments and local variables
local vars of a method are stored. This includes a space for the
receiver (this) at position/offset 0.
operand stack
Stack for storing intermediate results during the
execution of the method.
• Initially it is empty.
• The maximum depth is known at compile time.
Viewing Byte Code
Java
Use javap with –v flag.
Python
Used dis (disassembler package from python).
Python
JavaScript
Used node.js with node --print-bytecode flag
C/C++ (C lang)
Used clang –S flag to generate assembly code
Thanks you all.