Course 12
1
Formal Languages, Automata and
Compilers
Compilation
2
Source code
Lexical analyzer
Syntactic analyzer
Semantic analyzer Characters Lexemes Syntactic
tree
Code generator Decorated
syntactic tree
Assembler Interpreter
Processor code
Intermediary code
Recap
3
Lexical analysis
Validates tokens
Syntactic analysis
Validates the syntactic tree
Semantic analysis
Detects all other errors
Last step that does analysis
Types and Declarations (1)
4
Verification and translation
Verification concerns type compatibility
Translation determines the necessary memory space for a name
Type expressions.
A base type
A type name
Formed by applying the record type constructor to the field names and their types
Formed with the function constructor. s→t denotes a function from type s to type t
Types and Declarations (2)
5
Type expressions
If s and t are type expressions, then s x t is atype expression
May contain variables whose values are type expressions
Type expressions can be descriend by means of a directed acyclic graph
Internal nodes are constructors
Leaves are elementary type expressions (types, type names and variables)
Types and Declarations (3)
6
Example
int[2][3]
Is interpreted as a vector of vectors in int
The equivalent type expression is array(2, array(3, integer))
Type Equivalence (1)
7
Determined using matching rules
If two type expressions are equivalent, return type, else return error
There may be ambiguities when types are renamed, and the new names are further used in type expressions
Type Equivalence (2)
8
In the case of directed acyclic graph representation, two types are structurally qeuivalent if and only if
They are the same base type
Are obtained by using the same constructor over equivalent types
One is a renaming of the other
Storage Layout for Local Names (1)
9
Given the type of a name, one can determine the amount of memory necessary for the storage of that name
At compile time, this is used to assign each name a relative memory address (stored in the symbol table)
For types which have variable width (ex. string) or the width of which cannot be determined until runtime, a fixed amount of storage is reserved starting from a given pointer
Storage Layout for Local Names (2)
10
Storage Layout for Local Names (3)
11
Code Generation
12
Transforms semantic analysis into executable code
Transformation is particular to each system and architecture
Intermediate code generation – detaches the analysis and interpretation from the generation of machine code
Code Generation
13
Parse tree High level intermediate code
Target code Low level intermediate
code
Intermediate Code
14
Decorated parse trees
Three address code
x = y op z
Intermediate code
High level
Close to the source language (parse trees)
Appropriate for high level tasks (e.g. type checking)
Low level
Memory allocation and registry management
Instruction selection
Parse trees
15
Directed acyclic graphs
a + a * (b - c) + (b - c) * d
Directed Acyclic Graphs
16
The grammar given below can build syntax trees or DAGs
The Leaf and Node functions will create new nodes if equal nodes do not exist
If the node already exists, it will be returned instead of a new
one
Directed Acyclic Graphs
17
Directed Acyclic Graphs – Array Representation
18
The nodes of the tree are represented in an array
Each row represents a node
The first cell represents the operator
Each node has an associated value (pointer or constant)
Directed Acyclic Graphs – Array Representation
19
Nodes are referred using the row index
Node or expression indices are called value number
In actual practice memory addresses are used
Value numbers can be used to build directed acyclic graphs
The Value Number Method for Constructing DAGs
20
Input: operator label and operands
Output: Value number for the node with the signature <op, l, r>
Algorithm
Search in array for the node M with the label op and left-right descendants l and r;
If it exists, return the value number of M
Else, create a new node N, with the label op and descendants l and r; return N
Three Address Code
21
Represented as a sequence made up of at most three elements
Operator
Left operand
Right operand
Ex: a + b * c
t1 = b * c
t2 = a + t1
Can be used for decomposing arithmetic operations, control statements, etc.
Used for code generation and optimization
Three Address Code
22
A linear description of a parse tree or a DAG
For the DAG given below, the three address code translation is
given to the right
Addresses and Instructions
23
Three address code is based on addresses and instructions
An address might be:
A name (will be replaced with an address in memory at generation time)
A constant (type conversions may be necessary)
Temporary variables generated by the compiler (generally used for optimizations)
Instructions for
Three Address Code
24
Assignment instructions of the form c = a op b
Assignment instructions of the form a = op b
Copy instructions: a = b
Jumps: goto L; the instruction at label L will be executed next
Conditional jumps: if a goto L or if false a goto L
Conditional jumps: if a relop b goto L; if a is in relop relation to b, execute instruction at address L, else execute the
instruction immediately after the conditional jump
Instructions for
Three Address Code
25
Function calls:
param x1
param x2
…
param xn
call p, n
Note that the function is stored in the function table and p is a reference to it
Indexed copy: a = b[i] ora[i] = b
a[] = b changes the value of the address which is i positions after the address of a
Address and pointer assignments: a=&b, a=*b, *a=b
Example
26
Let the instruction be
do i = i + 1; while (a[i]<v);
Bibliography
27