Table of contents
1.
Introduction
2.
Lexeme Overview
3.
Types of Lexemes
4.
Calculating Lexemes in Program
5.
Frequently Asked Questions
5.1.
Why Lexemes are important in Compiler Design?
5.2.
How lexeme can be identified?
5.3.
Why lexemes are used?
5.4.
What is Lexer?
6.
Conclusion
Last Updated: Mar 27, 2024

Lexeme in Compiler Design

Author Lakshay Chauhan
4 upvotes
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In the study of compiler design, One of the steps in the compilation process is the identification and grouping of lexemes. In this article, we'll discuss the concept of lexemes in compiler design and the different types of lexemes found in programming languages, such as keywords, identifiers, literals, and operators. We will also discuss the examples of each type of lexeme.

By the end of this post, you'll have a better understanding of lexemes in compiler design. I will recommend you to read about Lexical Analyzer, and Lexical Analysis in Compiler Design

Lexeme in Compiler Design

Lexeme Overview

A lexeme is a sequence of characters of a program that is grouped together as a single unit. When a compiler or interpreter reads the source code of a program, the compiler breaks it down into smaller units called lexemes. These lexemes will help the compiler to analyze and process the program efficiently.

Lexemes are assigned to a token type, which is a numeric code that represents the type of lexemes. These tokens can be passed to the parser, which can be used to construct the parse tree with various methods of parsing.

For example, If there is a statement ‘a = b / 2;’ in C++ programming language. This statement will be broken into:

  • Identifier: ‘a’
  • Operator: ‘=’
  • Identifier: ‘b’
  • Operator: ‘/’
  • Literal: ‘2’
  • Punctuation: ‘;’

Types of Lexemes

  1. Operators: These are the symbols that can be used to perform mathematical or logical operations such as addition (+), subtraction (-), multiplication (*), division (/), etc.
     
  2. Identifiers: These are the name of variables, methods, and functions. For example, If there is a statement ‘int a = 5;’ in C++. Here ‘a’ is an Identifier.
     
  3. Punctuation: These are the symbols that are used to separate and group different parts of the program. For example, semicolon (;), comma (,), braces ({}), etc.
     
  4. Keywords: These are used to identify programming constructs such as loops, conditionals, and functions. For example, ‘if’, ‘else’, ‘while’, etc.
     
  5. Literals: These are the values that can be used in the source code. For example, If there is a statement ‘int a = 5;’ in C++. Here ‘5’ is literal. Literals can be of any given data type.q

 

Also check out - Phases of Compiler and Cross Compiler

Calculating Lexemes in Program

Even though the calculation of lexemes in a program is done by the lexers but to understand lexemes more, Let's take an example:

Consider the following C++ program:

#include <iostream>
using namespace std;

int main() {
  int x = 10;
  cout << "The value of x is: " << x << endl;
  return 0;
}
You can also try this code with Online C++ Compiler
Run Code

 

The total number of lexemes in the above program is 29 lexemes.

To count the number of in this program, we need to break it down into constituent parts (or each line) and identify and calculate the lexemes in each part:

include <iostream>   // 3 lexemes: "include", "<iostream>", and ">"
using namespace std;  // 4 lexemes: "using", "namespace", "std", and ";"

int main() {          // 3 lexemes: "int", "main", and "()"
  int x = 10;          // 5 lexemes: "int", "x", "=", "10", and ";"
  cout << "The value of x is: " << x << endl;   // 11 lexemes: "cout", "<<", ""The value of x is: "", "<<", "x", "<<", "endl", and ";"
  return 0;           // 3 lexemes: "return", "0", and ";"
}
You can also try this code with Online C++ Compiler
Run Code

Must Read Recursive Descent Parser.

Frequently Asked Questions

Why Lexemes are important in Compiler Design?

Lexemes are used to understand and analyze the source code by the compiler. Lexemes are used in breaking down the source code into smaller units to process the program. This makes the process of Understanding and Analyzing easier and more efficient.

How lexeme can be identified?

Here are 6 types of lexemes that can be present in the program; we identify these lexemes by these types:

  1. Operators:  +, -, /, *, and %.
  2. Identifiers: If there is a statement ‘int a = 5’ in C++. Here ‘a’ is an Identifier.
  3. Punctuation: semicolon (;), comma (,), braces ({}), etc.
  4. Keywords: ‘if’, ‘else’, ‘while’, etc.
  5. Literals: 5, 3.14, 89, “Ninja”, etc.

Why lexemes are used?

Lexemes will help the compiler to analyze and process the program efficiently. Each Lexeme is assigned a token type, which is a numeric code that represents the type of lexeme. These tokens can be passed to the parser, which can be used to construct the parse tree with various methods of parsing.

What is Lexer?

Lexer (or lexical analyzer) is the first component in the compilation process which produces lexeme. Lexical Analyzer reads the source code character by character and groups them as a single unit, and then these lexemes can be passed to the parser for constructing the parsing tree.

Conclusion

A lexeme is a sequence of characters of a program that is grouped together as a single unit. When a compiler or interpreter reads the source code of a program, the compiler breaks it down into smaller units called lexemes. In this article, we discussed what lexemes are, how lexemes can be identified, and the types of lexemes, and we also discussed an example of calculating lexemes in a C++ program.

To know more about compiler design, you can take a look at this library, “Compiler Design”. To have an understanding of how lexemes are produced, read this article on lexers (or lexical analyzers).

Do check out The Interview Guide for Product Based Companies, as well as some of the Popular Interview Problems from Top companies like AmazonAdobeGoogleUberMicrosoft, etc., on Coding Ninjas Studio.

Also, check out some of the Guided Paths on topics such as Data Structure and AlgorithmsCompetitive ProgrammingOperating SystemsComputer Networks, DBMS, and System Design, etc. as well as some Contests, Test SeriesInterview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

Live masterclass