Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction to Lexical Analyzer
1.1.
Tokens
1.2.
Pattern
1.3.
Regular Expressions
2.
Representation of a Token
3.
Description of Tokens
4.
Regular Expressions
4.1.
Regular Expressions Rules: The letter
5.
Language
5.1.
Operations on Languages
6.
Frequently Asked Questions
6.1.
What is regular language?
6.2.
What is the precedence order of concatenation?
6.3.
The regular expression for any number of 0’s followed by any number of 1’s
6.4.
The RE, which gives none or many instances of a or b is?
6.5.
The regular expression denotes a language comprising all possible strings of even length over the alphabet (a, b).
7.
Conclusion
Last Updated: Mar 27, 2024

Regular Expressions

Author Prachi Singh
2 upvotes
Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @
Compiler Design

Introduction to Lexical Analyzer

Lexical Analyzer interprets the source code to produce tokens. The production of tokens depends on the demand of the parser. The tokens do not result in a queue formation.

 

Example:

Given:

Source code

if(a==0) a=b;

The lexical analyzer interprets it as:

if

(

a

==

0

)

a

=

b

;

 

 

Hence, the function of Lexical Analyzer is to convert multi-variable input code to tokens. It helps in reducing the length of the source code by removing spaces.

Read About - Symbol Table Operations And Cross Compiler

Tokens

A token is a dual-valued data container with a token name and associated attribute value. The token name acts as an input that the parser processes. The token name is a cohesive symbol representing the lexical unit.

Pattern

A pattern describes the information of the form which the token takes.

Regular Expressions

The expressions used to analyze specific patterns are called regular expressions.

Recommended Topic, Specifications of Tokens in Compiler Design

Must Read Recursive Descent Parser.

Representation of a Token

The representation of tokens in the regular expression is as follows:

TOKEN

INFORMATION DESCRIPTION

LEXEME SAMPLE

id

Letter followed by a letter and digits

pi, area

number

Any numeric value

3.1459,0

for

Characters f,o,r

for

 

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Description of Tokens

To describe source code tokens, regular expressions are taken in consideration.

A Regular Expression is defined as

a Ordinary character that stands for itself
Epsilon  Empty or null string
R|S Either R or S (alteration required)
RS R followed by S (concatenation needed)

 

Regular Expressions

Regular expressions describe source code in the form of tokens. A regular expression is the cohesive format of simpler expressions. A language denoted by the regular expression is termed regular language.

Regular Expressions Rules: The letter

Regular Expression over alphabet Σ 

Regular Expression  Language Denoted
   
a Σ {∈}
R|S L(R) U L(S)
(R)(S) L(R)(S)
R L(R)
(R)? (R)|

 

Language

A regular expression with a set of strings or characters is termed Regular Language.

L(R) = the language defined by R

  1. L{xyz} = {xyz}
  2. L(coding | ninjas) = {coding, ninjas}

Operations on Languages

The various operations that can be performed on regular languages are as follows:

1. Concatenation

RS = { ab | aR and bS} 

2. Union

R U S = { ab | aR or bS} 

3. Exponentiation

L^0 = {} 

4. Kleene ClosureL* = zero or more characters of L 

 

5. Positive Closure

L+ = one or more characters of L

 

Frequently Asked Questions

What is regular language?

The grammar defined by regular expressions is called regular language.

What is the precedence order of concatenation?

Concatenation has the second-highest precedence.

The regular expression for any number of 0’s followed by any number of 1’s

0*1*

The RE, which gives none or many instances of a or b is?

(a+b)*

The regular expression denotes a language comprising all possible strings of even length over the alphabet (a, b).

(aa+abbb+ba)*

Conclusion

Congratulations on finishing the blog!! After reading this blog, you will grasp the concept of Regular Expressions.

Recommended Reading:

Do check out The Interview guide for Product Based Companies as well as some of the Popular Interview Problems from Top companies like Amazon, Adobe, Google, Uber, Microsoft, etc. on Coding Ninjas Studio.

Also check out some of the Guided Paths on topics such as Data Structure and Algorithms, Competitive Programming, Operating Systems, Computer Networks, DBMS, System Design, etc. as well as some Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on Coding Ninjas Studio.

Previous article
Lexical Analysis in Compiler Design
Next article
Lex in Compiler Design
Live masterclass