Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
The parser is a compiler phase that accepts a token string as input and turns it into the matching intermediate representation using existing grammar. A parser is also known as a syntax analyzer. There are two types of parsing techniques in compiler design: Top-down parsing and Bottom-up parsing.
Top-Down Parsing- It is a parsing strategy that starts at the top of the parse tree and works its way down by applying grammatical rules.
Bottom-up Parsing- It is a parsing strategy that begins with the lowest level of the parse tree and works its way up using grammatical rules.
This article will learn a type of bottom-up parsing, i.e., LR parsing.
LR Parser is a type of bottom-up parsing. It is used to parse context-free grammar, primarily used by compilers. The LR parser reads the input from left to right and produces the rightmost derivation.
The LR parsing can be further divided into four types-
The LR parser is denoted as LR(k), where the letter “L” refers to the left to right scanning of the input, the letter “R” refers to the rightmost derivation in reverse, and “k” refers to the number of unconsumed “look ahead” input symbols that are used in making parser decisions. The letter k is usually one and is frequently omitted. A context-free grammar is called LR(k) if the LR(k) parser exists.
LR Algorithm
The LR (Left-to-Right) algorithm, commonly referred to in the context of parsing, is a technique used to analyze a sequence of input symbols to determine its grammatical structure with respect to a given formal grammar. It is widely used in the construction of compilers and interpreters.
Initialization: Start with an empty stack and place a special end-of-input marker ($) at the bottom. Place the start symbol of the grammar on top of the stack.
Read Input: Read the input string from left to right, symbol by symbol.
Shift and Reduce Operations: Shift, move the current input symbol to the stack and advance to the next input symbol. Reduce, if the stack matches the right-hand side of a production rule, replace it with the left-hand side non-terminal.
Parser Actions: Use a parsing table to decide whether to shift, reduce, accept, or error. Accept, if the stack contains the start symbol and the input is fully consumed, the string is successfully parsed. Error, if no valid action exists for the current stack and input symbol, report a syntax error.
Termination: The algorithm terminates when the input is successfully parsed (accept state) or a syntax error is encountered.
Next, we will discuss the block diagram of LR parsing.
Block Diagram
Below described is the block diagram of the LR parser.
The input buffer takes one symbol at a time from left to right a1, a2,..., am,$.
The stack contains the combination of the state symbol and the current input symbol.
The parsing table is a two-dimensional array containing the Action and GoTo tables.
The parsing table is a two-dimensional array consisting of two tables- the ACTION table and the GOTO table.
Let’s see them one by one.
ACTION Table
The Action table specifies the grammar rules for implementing the input stream’s current state and terminal. There are four cases used in the Action table. These are-
Shift-This operation involves moving the current symbol and the current state n from the input buffer onto the stack. So, it becomes the new present state.
Reduce-
The number m is written to the output stream.
The state is deleted from the stack, according to the symbol m mentioned on the left-hand side of rule m.
The symbol m mentioned on the left-hand side of rule m indicates that a new state is looked up in the goto table and pushed onto the stack as the new current state.
Accept-After repeating the shift and reduce operations, if the stack contains the starting symbol of the input string and the input buffer is empty, i.e., includes the $ symbol, the input string is said to be accepted.
Error-If the parser cannot perform the shift or the reduce operation, also the string is not accepted, then it is said to be in the error state.
This function is denoted by (Sm, ai), where Sm represents the four cases defined above and ai the current input symbol.
GOTO Table
The GOTO Table in the parsing table tells the next state where the parsing should proceed.
This function is denoted by (Sm, ai), where Sm represents the four cases defined above and ai the current input symbol.
Next, we will discuss the steps of LR parsing.
LR Parser Steps
Initially put 0 in the stack and the input string with the $ symbol in the input buffer. Then follow the following steps to process the LR parsing-
If the parsing table consists of the ACTION[Sm, ai] function equal to shift s, execute the shift operation. Push the current input symbol ai and the next state S = GOTO[Sm, ai] to the stack. The symbol ai+1 will become the new current input symbol.
If the parsing table consists of the ACTION[Sm, ai] function equals to reduce A → B, then execute the reduce operation, entering the configuration (S0S1……Sm-rS, aiai+1…..an$) where r is the length of the right side of the production and S = GOTO[Sm-r, A]. The parser will pop ‘2r’ symbols from the stack to expose Sm-r, then push both ‘A’ (left side of production) and ‘S’ (entry for ACTION[Sm-r, A]) onto the stack. Note: The current input symbol is not changed in the reduce operation.
If the parsing table consists of the ACTION[Sm, ai] function equal to accept, then the parsing is said to be completed, and the input string is accepted.
If the parsing table consists of the ACTION[Sm, ai] function equals to error means the parsing has encountered an error. So, the error routine will be called.
Now, we will discuss an example to understand the LR parsing.
Example
Question: Consider the following grammar
E → E + T
E → T
T → T x F
T → F
F → (E)
F → id
The ACTION and GOTO functions of the LR parsing table is given as follows:
STATE
ACTION
GOTO
id
+
x
(
)
$
E
T
F
0
s5
s4
1
2
3
1
s6
A
2
r2
s7
r2
r2
3
r4
r4
r4
r4
4
s5
s4
8
2
3
5
r6
r6
r6
r6
6
s5
s4
9
3
7
s5
s4
10
8
s6
s11
9
r1
s7
r1
r1
10
r3
r3
r3
r3
11
r5
r5
r5
r5
Using the ACTION and GOTO function table, parse the input string “idxid+id”.
Solution: First, we will label the given grammar.
(1) E → E + T
(2) E → T
(3) T → T x F
(4) T → F
(5) F → (E)
(6) F → id
Second, we will take a stack and input buffer and put 0 (initial state) and the input string in them respectively.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
At the initial state 0, we have s5 for id in the ACTION function of the parsing table. So, we will push id to the symbol column and 5 to the stack.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
At state 5, we have r6 operation. So, we will reduce id by F → id and pop 5 out of the stack. We will also push 3 to the stack since for state 0 (now the symbol at the top of the stack), the GOTO function says to go to state 3 for F.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
At state 3, we have the r4 operation for the symbol. So, we will reduce F by T → F and pop 3 out of the stack. We will also push 2 to the stack since for state 0 (now the symbol at the top of the stack), the GOTO function says to go to state 2 for T.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
At state 2, we have the s7 operation for symbol x. So, we will push x to the symbol column and 7 to the stack.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
Tx
id+id$
At state 7, we have s5 operation for symbol id and s4 operation for symbol (. Since the current input symbol is id, we will go with the s5 operation. So, we will push id to the symbol column and 5 to the stack.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
At state 5, we have an r6 operation for the symbol. So, we will reduce id by F → id and pop 5 out of the stack. We will also push 10 to the stack since for state 7 (now the symbol at the top of the stack), the GOTO function says to go to state 10 for F.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
At state 10, we have r3 operation. So, we will reduce T x F by T → T x F and pop all the states till T i.e. till 2 from the stack. We will also push 2 to the stack since for state 0 (now the symbol at the top of the stack), the GOTO function says to go to state 3 for F.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
At state 2, we have s7 operation for the x symbol and r2 operation for +, ) and $ symbol. Since the current input symbol is +, we will go with r2. So, we will reduce T by E → T and pop 2 from the stack. We will also push 1 to the stack since for state 0 (now the symbol at the top of the stack), the GOTO function says to go to state 1 for E.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
At state 1, we have an s6 operation for the + symbol. So, we will push + to the symbol column and 6 to the stack.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
Shift
(10)
0 1 6
E +
id$
At state 6, we have an s5 operation for the id symbol and an s4 operation for ( symbol. Since the current input symbol is id, we will go with the s5 operation. So, we will push id to the symbol column and 5 to the stack.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
Shift
(10)
0 1 6
E +
id$
Shift
(11)
0 1 6 5
E + id
$
At state 5, we have r6 operation. So, we will reduce id by F → id and pop 5 from the stack. We will also push 3 to the stack since for state 6 (now the symbol at the top of the stack), the GOTO function says to go to state 3 for F.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
Shift
(10)
0 1 6
E +
id$
Shift
(11)
0 1 6 5
E + id
$
Reduce F → id
(12)
0 1 6 3
E + F
$
At state 3, we have an r4 operation. So, we will reduce F by T → F and pop 3 from the stack. We will also push 9 to the stack since for state 6 (now the symbol at the top of the stack), the GOTO function says to go to state 9 for T.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
Shift
(10)
0 1 6
E +
id$
Shift
(11)
0 1 6 5
E + id
$
Reduce F → id
(12)
0 1 6 3
E + F
$
Reduce T → F
(13)
0 1 6 9
E + T
$
At state 9, we have s7 operation for x and r1 operation for +, ) and $. Since $ is the current input symbol, we will go with r1. So, we will reduce E + T by E → E + T and pop all the states till E, i.e., till 6 from the stack. And we will push 1 into the stack.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
Shift
(10)
0 1 6
E +
id$
Shift
(11)
0 1 6 5
E + id
$
Reduce F → id
(12)
0 1 6 3
E + F
$
Reduce T → F
(13)
0 1 6 9
E + T
$
Reduce E → E + T
(14)
0 1
E
$
At state 1, we have s6 operation for + and A i.e. accept for $. Since the current input symbol is $, the input string will be accepted, and the parsing will be completed.
Stack
Symbol
Input Buffer
Parsing Action
(1)
0
idxid+id$
Shift
(2)
0 5
id
xid+id$
Reduce F → id
(3)
0 3
F
xid+id$
Reduce T → F
(4)
0 2
T
xid+id$
Shift
(5)
0 2 7
T x
id+id$
Shift
(6)
0 2 7 5
T x id
+id$
Reduce F → id
(7)
0 2 7 10
T x F
+id$
Reduce T → T x F
(8)
0 2
T
+id$
Reduce E → T
(9)
0 1
E
+id$
Shift
(10)
0 1 6
E +
id$
Shift
(11)
0 1 6 5
E + id
$
Reduce F → id
(12)
0 1 6 3
E + F
$
Reduce T → F
(13)
0 1 6 9
E + T
$
Reduce E → E + T
(14)
0 1
E
$
Accept
Augment Grammar
Augmented grammar involves modifying a given grammar to facilitate certain parsing techniques, especially in the context of LR parsing. By augmenting a grammar, we add an extra production rule and a new start symbol, which helps in managing the parsing process more efficiently.
The primary goal of augmenting a grammar is to introduce a unique start state and handle the end-of-input marker explicitly. This modification ensures that the parser can recognize the completion of the input string and correctly handle the acceptance of the input.
Steps to Augment a Grammar
Introduce a New Start Symbol: Create a new start symbol (S') that is not part of the original grammar.
Add a New Production Rule: Add a production rule from the new start symbol to the original start symbol of the grammar (S' -> S).
Update the Grammar: Ensure that the new production rule and start symbol are incorporated into the existing set of rules.
Example
Given a simple grammar:
S -> aSb | ε
1. Introduce a New Start Symbol: Let S' be the new start symbol.
2. Add a New Production Rule: Add the production S' -> S.
3. Update the Grammar: The augmented grammar becomes:
S' -> S
S -> aSb | ε
Parsing with the Augmented Grammar
Using the augmented grammar, an LR parser can effectively determine the end of parsing when the input is completely reduced to the new start symbol S', ensuring accurate and efficient parsing of the input string.
Frequently Asked Questions
Define parser.
The parser is a compiler phase that accepts a token string as input and turns it into the matching intermediate representation using existing grammar. A parser is also known as a syntax analyzer.
What is LR parsing?
LR Parsing is a type of bottom-up parsing. It is used to parse context-free grammar, mostly used by compilers. The LR parser reads the input from left to right and produces the rightmost derivation.
What are the types of LR parsing?
There are four types of LR parsing - LR(0), SLR(1), LALR(1) and CLR(1).
What is the difference between LR(0) and SLR?
LR(0) parsing uses only the state and the current input symbol, while SLR (Simple LR) parsing considers the follow sets of the non-terminals for making reduction decisions, thus handling more grammars and reducing conflicts.
What is LR(0) parsing technique?
LR(0) parsing technique involves constructing a DFA from LR(0) items, then parsing the input string by shifting symbols and reducing based on the constructed automaton, without considering lookahead symbols.
What is a CLR parser?
A CLR (Canonical LR) parser uses LR(1) items, which include lookahead symbols, providing more context and reducing conflicts compared to LR(0) and SLR parsers, making it more powerful and capable of handling a wider range of grammars.
Conclusion
In this article, we have studied the LR parser. LR parsers, including LR(0), SLR, and CLR variants, are powerful tools in compiler design, providing efficient and accurate syntactic analysis for a wide range of programming languages.