Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Gate Questions on Lexical Analysis
Frequently Asked Questions
Which mathematical model is used in the lexical analyzer?
What is a token?
What elements are included in non-tokens components?
What is a Lexeme?
Last Updated: Mar 27, 2024

Gate Questions on Lexical Analysis

Author Tanay Kumar
1 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


Lexical analysis is the first phase for the compiler, also known as a scanner. It takes the code from the modified language preprocessors written in sentences and breaks these sentences into tokens by removing comments, extra white spaces, etc., in the source code.

Refer to Lexical Analysis in Compiler Design for more detail on the topic.

This article will discuss previous years' gate questions on lexical analysis with proper solutions and explanations.

Gate Questions on Lexical Analysis

Let us now discuss the previously asked gate questions on Lexical Analysis.

1. The number of tokens in the given C statement is ______.

printf("pt = %d, &pt = %x", pt, &pt);                           [GATE 2000]


Solution: 10.

In a C program, the essential component recognized by the compiler is the "token". A token is a source-program content that the compiler does not break down into component elements. We have six C tokensidentifiersconstantskeywordsoperatorsstring literals, and other separators. We have ten tokens within the above printf statement. Underneath are tokens in the above code:-

"pt = %d, &pt = %x"


2. In any compiler, we can recognize keywords of a language during ____. [GATE CS 2011]

  1. Parsing of the program.
  2. The code generation.
  3. Lexical analysis of the program.
  4. Dataflow analysis.


Solution: (C).

Lexical analysis is the method of changing over a grouping of characters into an arrangement of tokens. A token can be a keyword.


3. Consider the following statements: 

(1) The output of the lexical analyzer is groups of characters. 

(2) Total tokens count in the  printf("pt=%d, &pt=%x", pt, &pt); are 11. 

(3) Symbol table can be implemented with an array and hash table but not a tree. 

Which of the given statement(s) is/are correct? [GATE CS 2011]


  1. Only (1).
  2. Only (2) and (3).
  3. Only (1), (2), and (3).
  4. None of the above.


Solution: (D).

The correct statement is as follows 

(1) The output of the lexical analyzer is tokens.

(2) Total tokens count in printf("pt=%d, &pt=%x", pt, &pt); are 10.

(3) We can implement a symbol table with an array, hash table, tree, and linked lists. 

So, option (D) is correct.


4. Which one of the below statements is FALSE? [GATE CSE 2018]

  1. We can use Context-free grammar for specifying both lexical and syntax rules.
  2. We do type checking before parsing.
  3. We can translate high-level language programs to different Intermediate Representations.
  4. Function arguments can be passed with the program stack.


Solution: (B).

Type checking is done at the semantic analysis phase, and parsing is done at the syntax analysis phase. And we know the syntax analysis phase comes before semantic analysis. So Option (B) is False.


5. The lexical Analysis for any modern programming language such as Java needs the power of which one of the below machine models in a sufficient and necessary sense?

  1. Finite state Automata
  2. Non-deterministic automata
  3. Deterministic pushdown automata
  4. Turing machine


Solution: (A)

In the lexical analysis, finite automata are used to produce tokens in the form of keywords, identifiers, and constants from the input program. Pattern recognition is used to search keywords with string-matching algorithms.


6. The lexical analyzer uses the given patterns for recognizing three tokens, T1, T2, and T3, over the alphabets a,b,c. T1: a?(b∣c)*a & T2: b?(a∣c)*b & T3: c?(b∣a)*c. Note: 'x?' means 1 or 0 occurrences of the symbol x. Also, the analyzer outputs the token matching the longest possible prefix. If the analyzer processes the string "bbaacabc", which of the below is the sequence of tokens it outputs? [GATE CSE 2018]

  1. T1T2T3
  2. T1T1T3
  3. T2T1T3
  4. T3T3


Solution: (D).

We can rewrite the tokens as: T1 : (b+c)* a + a(b+c)* a and T2 : (a+c)* b + b(a+c)* b and T3 : (b+a)* c + c(b+a)* c.

The given String is "bbaacabc" the longest matching prefix is "bbaac" (That may be generated by T3). The remaining part  "abc" can again be generated by T3. Hence, the answer is T3T3.


7. Match the description of various parts of a classic optimizing compiler in List - I, with their names in List - II:

Question Image 

  1. (a) - (iii), (b) - (iv), (c) - (ii), (d) - (i).
  2. (a) - (iv), (b) - (iii), (C) - (ii), (d) - (i).
  3. (a) - (ii), (b) - (iv), (C) - (i), (d) - (iii).
  4. (a) - (ii), (b) - (iv), (c) - (iii), (d) - (i).


Solution: (A).

  • The parser is a part of the compiler and is responsible for syntax recognition. 
  • The lexical analyzer uses a scanner (or tokenization). 
  • In Semantic Analysis, consistency and definition of syntax are checked. 
  • An optimizer is used to improve the IR program. 

Hence, option (A) is correct.


8. The output of any lexical analyzer is?

  1. A parse tree.
  2. Machine code.
  3. Intermediate code.
  4. A stream of tokens.


Solution: (D)

The Lexical Analysis produces a stream of tokens as output, consisting of identifiers, keywords, separators, operators, and literals.


9. Consider the given statements related to the compiler construction: 
(1.) Lexical Analysis is specified by context-free grammar and implemented with pushdown Automata
(2.) Syntax Analysis is defined by regular expressions and implemented with a finite-state machine. 
Which of the above statement(s) is/are correct?

  1. Only (1)
  2. Only (2)
  3. Both (1) and (2)
  4. Neither (1) nor (2)


Solution: (D). 
Option D is correct because:

Statement 1 is false because the lexical analysis phase is specified by regular expression and implemented by FA (Finite automata)

Statement 2 is false because syntax analysis is specified by context-free grammar and implemented by PDA (Pushdown Automata)


10. Which statement(s) regarding the linker software is/are true? (1) A function of the linker is to combine various object modules into a single load module. (2) A function of the linker is to replace absolute references in an object module with symbolic references to locations in other modules.

  1. Only (1)
  2. Only (2)
  3. Both (1) and (2)
  4. Neither (1) nor (2)


Solution: (A)

The linker is a computer program that combines two or more records produced by the compiler into a single executable form. Choice (1) is correct, but choice (2) doesn't take after the linker.


11. From the provided data: a b b a a b b a a b which one of the below is not a word in the dictionary created by LZ-coding (given the initial terms are a, b)?

  1. b a
  2. b b
  3. a b
  4. b a a b


Solution: B and D

Input is "a b b a a b b a a b". According to L-Z coding: The dictionary contains: a | b | ba | ab| baa (is already present) "baab" and "bb" are not there. So, options B and D are correct.

12. Debugger is a program that

  1. Allows to modify and examine the contents of registers.
  2. Does not allow execution of a part of a program
  3. Allows to set breakpoints, execute a part of the program, and display contents of the register.
  4. All of the above


Solution : (C).

A debugger is a program used by programmers to debug and test a target program. Debuggers can use instruction-set simulators rather than running a program directly on the processor to achieve higher control over its execution. It also allows to set breakpoints, execute a segment of the program and display the contents of the register. So, option (C) is correct.

This article has covered the previous year's gate questions on Lexical Analysis. Also, check out the Gate Syllabus for CSE.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Frequently Asked Questions

Which mathematical model is used in the lexical analyzer?

We use the deterministic finite automata mathematical model in a lexical analyzer.

What is a token?

A lexical token is the sequence of characters that may be treated as a unit for the grammar of the programming languages.

What elements are included in non-tokens components?

Macro, extra white spaces, preprocessor directives, and comments are included in non-tokens elements.

What is a Lexeme?

It is a sequence of characters matched with a pattern to form the corresponding sequence of input characters, which comprises a single token or a token called a lexeme.


In this article, we have extensively discussed previous years' gate questions on lexical analysis.

Recommended Reading:

Do check out The Interview guide for Product Based Companies as well as some of the Popular Interview Problems from Top companies like Amazon, Adobe, Google, Uber, Microsoft, etc. on Coding Ninjas Studio.

Also check out some of the Guided Paths on topics such as Data Structure and Algorithms, Competitive Programming, Operating Systems, Computer Networks, DBMS, System Design, etc. as well as some Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.


Live masterclass