Intermediate Code Generation for Compiler
Let there be a function f(a1, a2, a3, a4), a function f with four parameters a1,a2,a3,a4.
Three address code for the above procedure call(f(a1, a2, a3, a4)).
param a1
param a2
param a3
param a4
call f, n
‘call’ is a calling function with f and n, here f represents name of the procedure and n represents number of parameters
Now, let’s first take an example of a program to understand function definition and a function call.
main()
{
swap(x,y); //calling function
}
void swap(int a, int b) // called function
{
// set of statements
}
In the above program we have the main function and inside the main function a function call swap(x,y), where x and y are actual arguments. We also have a function definition for swap, swap(int a, int b) where parameters a and b are formal parameters.
In the three-address code, a function call is unraveled into the evaluation of parameters in preparation for a call, followed by the call itself. Let’s understand this with another example:
n= f(a[i])
Here, f is a function containing an array of integers a[i]. This function will return some value and the value is stored in ‘n’ which is also an integer variable.
A→array of integers
F→ function from integer to an integer.
Three address codes for the above function can be written as:
t1= i*4
t2=a[t1]
param t2
t3= call f,1
n=t3
t1= i*4
In this instruction, we are calculating the value of i which can be passed as index value for array a.
t2=a[t1]
In this instruction, we are getting value at a particular index in array a. Since t1 contains an index, here t2 will contain a value. The above two expressions are used to compute the value of the expression(a[i]) and then store it in t2.
param t2
The value t2 is passed as a parameter of function f(a[i])
t3= call f,1
This instruction is a function call, where position 1 represents the number of parameters in the function call. It can vary for different function calls but here it is 1. The calling function will return some value and the value is stored in t3.
n=t3
The returned value will be assigned to variable n.
Let's see the production with function definition and function call. Several nonterminals like D, F, S, E, A are used to represent intermediate code.
D → define T id ( F ) { S }
F → 𝜖 | T id, F
S → return E ;
E → id ( A )
A → 𝜖 | E , A
In D→ define T id (F) {S}, the nonterminal D is for declaration, and T is for type. In the function declaration, we are going to define the type of the function(T), function name(id), parameters and code to be executed. (F) represent parameters/arguments of the function and {S} is code to be executed.
Now let’s see what can be a formal parameter.
F → 𝜖 | T id, F
Here the parameter can be empty(𝜖) or of some type(T) followed by the name(id). F at the end represents the sequence of formal parameters. For example, add(int x, int y, int w,.........).
S → return E;
Here S is code(set of statements to be executed) which will return a value of an expression(E).
E → id ( A )
Expression has some function call with the actual parameters. id represents the name of function and (A) represents actual parameters. Actual parameters can be generated by the nonterminal A.
A → 𝜖 | E, A.
Actual parameters can be generated with expression E, it can also be a sequence of parameters A. For example, in add function there can be multiple parameters w,x,y………, etc, add(x,y,z……..).
Intermediate representation
Intermediate code can be represented in two ways:
1. High Level intermediate code
A compiler or an interpreter generates high-level intermediate code, which stands between the source code and the machine code. It is portable and platform-independent, allowing it to run on various platforms. High-level intermediate code is frequently used to optimise code for a certain target platform or to make it more maintainable and modifiable.
2. Low Level intermediate code
Low-level intermediate code is a type of code that sits between high-level programming languages and machine code. It's typically written in assembly language or another low-level language. It bridges the gap between the high-level language and the machine code that the computer understands. Low-level intermediate code is often used to develop operating systems, device drivers, and other software. It's closer to machine code than high-level languages, making reading and writing more efficient and challenging.
Need for Intermediate Code Generation in Compiler design
If a source code can be translated directly into its target machine code, why do we need to translate it into an intermediate code that is then translated to its target code? Let's look at why we require an intermediate code.
- Suppose a compiler does not have the option of generating intermediate code when translating a source language to its target machine language. In that case, a full native compiler is required for each new machine.
- By keeping the analysis portion of all compilers, the same intermediate code eliminates the need for a new full compiler for each unique machine.
- The synthesis part of the compiler is altered depending on the target machine.
- By using code optimization techniques on intermediate code, it becomes easier to apply source code modifications to improve code performance.
Advantages of Intermediate Code Generation
- Portability: Intermediate code is platform-independent, making it easier to generate machine code for multiple architectures without rewriting the entire compiler.
- Optimization: Intermediate code provides an abstract, simpler representation, allowing the compiler to perform optimizations before generating machine-specific code.
- Simplified Compilation Process: It divides the compilation into manageable phases, separating concerns between front-end syntax processing and back-end code generation.
- Reusability: The intermediate code can be reused for different target machines, simplifying compiler design and maintenance.
- Error Detection: It helps in early detection of errors, as transformations in intermediate code can reveal inconsistencies before reaching machine code generation.
Disadvantages of Intermediate Code Generation
- Increased Compilation Time: Generating and processing intermediate code adds extra steps, slightly lengthening the overall compilation time.
- Memory Overhead: Storing and manipulating the intermediate code requires additional memory resources during the compilation process.
- Loss of Source Code Context: Some high-level details may be abstracted away in the intermediate code, making certain optimizations or error handling more challenging.
- Complexity: Designing an effective intermediate representation adds complexity to the compiler's architecture.
Frequently Asked Questions
Why do we need intermediate code generation?
In software development, intermediate code creation is essential for portability and optimisation. It links high-level programming languages and machine code, allowing code to be swiftly translated and optimised for various computer architectures.
What role does intermediate code play?
Intermediate code is the link between high-level programming languages and machine code. It aids software portability and optimisation by providing a readily translated and optimised standard language. It also expedites code compilation and comprehension.
In which phase does the intermediate code is generated?
Intermediary code is produced during the compilation process, following the front-end phase's parsing and processing of the source code, but before the back-end phase translates the source code into machine code.
What type of intermediate code is used in compiler?
The type of intermediate code used in compilers varies, but some commonly used types include Three-Address Code (TAC), Static Single Assignment (SSA) Code, and Virtual Machine Code (VM Code). Each class has advantages and disadvantages regarding efficiency, simplicity, and portability.
What is the difference between source code and intermediate code?
Source code is the high-level, human-readable programming code written by developers, whereas intermediate code is an abstract, machine-independent representation generated during compilation. Intermediate code serves as a bridge between source code and machine code, facilitating optimization and portability.
Conclusion
Intermediate Code Generation in Compiler Design enables a more efficient, flexible, and portable compilation process. By providing an abstract, machine-independent layer, it allows easier optimization, supports multiple target architectures, and simplifies the translation from source code to machine code. Despite added complexity, its advantages significantly enhance the compiler's overall performance and adaptability.
Recommended Reading:
Also check out our Guided Paths, Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on Code360.