Table of contents
1.
Introduction
2.
Intermediate Code Generation
3.
Intermediate Representation
4.
Postfix Notation
5.
Three-Address Code
6.
Syntax Tree
7.
Frequently Asked Questions
7.1.
What is intermediate language or code in compiler design?
7.2.
What is the disadvantage of the intermediate language?
7.3.
What do you understand by just-in-time compilation in intermediate language?
7.4.
What is the purpose of intermediate representation?
7.5.
What are some other names and examples of intermediate language?
8.
Conclusion
Last Updated: Mar 27, 2024

The Intermediate language

Author Abhay Trivedi
2 upvotes
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Compiler Design

Introduction

If we can directly translate a source code into its target machine code, why do we need to translate it into intermediate code that is then translated to its target code? 

Phases of Compilation

The above diagram shows where the Intermediate language code is generated.

Let's see the reasons why we need an intermediate code.

Also read About, Specifications of Tokens in Compiler Design

Intermediate Code Generation

The source code is translated into machine code using intermediate code. Between the high-level language and the machine language is intermediate code.

Intermediate code Generation

  • Suppose a compiler translates the source language to its target machine language without having the option of generating intermediate code, then for each new machine. In that case, we require a full native compiler.
  • Intermediate code eliminates the need for a new full compiler for every unique machine by keeping the analysis portion the same for all the compilers.
  • The second part of the compiler, synthesis, changes according to the target machine.
  • We apply the source code modifications to improve code performance by using code optimization techniques on the intermediate code.

Illustration Image

  • No actual back-end is required to be written for each new machine. 
  • The intermediate form may be more compact than machine code. It saves space in distribution and on the machine that executes the programs.

Related Article Intermediate Code Generation in Compiler Design.

Intermediate Representation

We can represent intermediate code in two ways:

  • High-Level Intermediate Representation: High-level intermediate code representation is very close to the source language itself. It can quickly generate from the source code, and we can easily apply code modifications to enhance performance. But for target machine optimization, it is less preferred.
  • Low-Level Intermediate Representation: This one is close to the target machine, making it suitable for register and memory allocation, instruction set selection, etc. It is ideal for machine-dependent optimizations.

Also See, Top Down Parsing

 

Intermediate code can be language-specific (e.g., Byte Code for Java) or language-independent (three-address code).

Postfix Notation

The standard (infix) way of writing the sum of a & b is with the operator in the middle: a + b

We put the operator at the right end of the phrase in postfix notation, as ab +. In general, if p1 and p2 are any postfix expressions and + is any binary operator, the outcome of applying + to the values signified by p1 and p2 is p1p2 +, which is postfix notation. Because the operators' location and arity(number of arguments) allow only one way to decode a postfix expression, it requires no parentheses in postfix notation. The operator comes after the operand in postfix notation.

For Example The postfix representation of the expression (A/(B-C)*D+E) is: ABC-/D*E+.

Recommended Topic, procedure call in compiler design

Three-Address Code

A statement involving three references(two for operands and one for result) is known as a three address statement. A sequence of three address statements is known as a three-address code. Three address statements are a = b op c, where a, b, and c will have address (memory location). Sometimes a statement might contain less than three references, but it is still called a three-address statement.

For Example: The three-address code for the expression a * -b + b / c :

tmp1 = -b

tmp2 = a * tmp1

tmp3 = c

tmp4 = b / tmp3

tmp5 = tmp2 + tmp4

tmp1, tmp2, tmp3, tmp4, tmp5 are temporary variables.

Syntax Tree

The syntax tree is nothing other than a reduced form of a parse tree. We move the operator and keyword nodes of the parse tree to their parents, and the single link in the syntax tree replaces a chain of single productions. The internal and child nodes are operators and operands, respectively. Forming a syntax tree put parentheses in the expression makes it easy to recognize which operand should come first.

Example 1: (a*b)+(d*e)

Illustration Image

Example 2: (a+b)*(c*(d+e))

Illustration Image

Frequently Asked Questions

What is intermediate language or code in compiler design?

Many compilers utilize a medium-level language as a stepping-stone between high-level and low-level machine code. Such stepping-stone languages are called intermediate languages or Intermediate codes.

What is the disadvantage of the intermediate language?

The disadvantage of intermediate language is speed. In most cases, interpreting the intermediate form will be a lot slower than executing translated code directly. However, the approach has seen some success, e.g., with Java.

What do you understand by just-in-time compilation in intermediate language?

We can eliminate some speed penalties by translating the intermediate code to machine code immediately before or during the program's execution. This hybrid form is called just-in-time compilation, and we often use it for executing the intermediate code for Java. 

What is the purpose of intermediate representation?

The data structure or code utilized internally by a compiler or virtual machine to represent source code is an intermediate representation (IR). We made an IR that allows for additional processing like optimization and translation.

What are some other names and examples of intermediate language?

The CPU cannot directly execute a language generated from programming source code. Being platform-independent, the intermediate language (IL) is also called bytecode, p-code, pseudocode, or pseudo-language. Examples of intermediate languages are assembly languages, such as PL/I, BASIC, and Fortran.

Conclusion

This article teaches about intermediate language. We also discussed the characteristics, benefits, and applications of the Intermediate language in compiler design.

Recommended Reading:

Do check out The Interview guide for Product Based Companies as well as some of the Popular Interview Problems from Top companies like Amazon, Adobe, Google, Uber, Microsoft, etc. on Coding Ninjas Studio.

Also check out some of the Guided Paths on topics such as Data Structure and Algorithms, Competitive Programming, Operating Systems, Computer Networks, DBMS, System Design, etc. as well as some Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

Do upvote our blog to help other ninjas grow.

Happy Learning!

Live masterclass