Table of contents
1.
Introduction
2.
What are Tokens in C?
3.
Uses of Tokens in C
4.
How many Tokens in C?
5.
Types of Tokens
5.1.
Identifiers
5.2.
Keywords
5.3.
Constants
5.3.1.
Types of Constants
5.4.
Strings
5.5.
Operators
5.6.
Special Symbols
6.
Classification of Tokens
7.
Rules for naming identifiers
8.
Tokens and Expressions
9.
Tokenization Process
10.
Examples to Implement C Tokens
10.1.
Example 1: Basic Arithmetic
10.2.
Example 2: Conditional Statement
11.
Frequently Asked Questions
11.1.
Give examples of valid and invalid identifier names.
11.2.
What is the use of the break keyword?
11.3.
What is the difference between a character constant and a string constant?
12.
Conclusion
Last Updated: Aug 25, 2024
Easy

Tokens in C

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

In C programming, tokens are the basic building blocks of source code, categorized into several types that form the syntax and structure of programs. Understanding tokens is essential for comprehending how C compilers interpret and process code. Each token represents a meaningful unit of code that contributes to the functionality and behavior of C programs.

Tokens in C

Also See, Sum of Digits in C, C Static Function

What are Tokens in C?

In C programming, tokens are the smallest units of a program that convey meaning to the compiler. They are categorized into several types:

  • Keywords: Reserved words that have predefined meanings in the C language (e.g., int, if, while).
  • Identifiers: Names given to entities such as variables, functions, and arrays by the programmer.
  • Constants: Fixed values that do not change during the execution of a program (e.g., integer constants, character constants).
  • String Literals: Sequences of characters enclosed in double quotes.
  • Operators: Symbols that perform operations on operands (e.g., arithmetic operators like +, relational operators like >, logical operators like &&).
  • Punctuators: Symbols that have special meanings in C, such as parentheses (), braces {}, commas ,, semicolons ;, etc.

Each token type serves a specific purpose in defining the syntax and semantics of C programs, enabling the compiler to parse and interpret code correctly.

Uses of Tokens in C

Tokens in C programming are fundamental for several purposes:

  • Syntax Definition: Tokens form the basic elements that define the syntax rules of the C language.
  • Compilation: They are processed by the compiler to generate executable code.
  • Debugging: Tokens aid in identifying and resolving syntax errors and logical issues within programs.
  • Interpretation: They convey the logic and structure of programs to both programmers and compilers.
  • Portability: Tokens ensure that C programs can be understood and executed consistently across different platforms and environments.

Understanding and correctly using tokens is essential for writing efficient, error-free, and portable C programs.

How many Tokens in C?

Token TypeExamples
Keywordsint, if, while, return
Identifiersmain, x, sum, MAX_SIZE
Constants42, 3.14, 'A', "Hello"
String Literals"C programming", "123"
Operators+, -, *, /, >, <, &&
Punctuators(), {}, ;, ,, .

These tokens collectively form the vocabulary of the C programming language, enabling programmers to write structured and functional code efficiently.

Types of Tokens

Identifiers

These are the tokens that are used for the general terminology, like for naming the variables, functions, and arrays. Identifiers are user-defined names that consist of alphabets, digits, and underscore. The names used for identifiers should be different from any of the keywords in the C language, they can differ in spelling or case. Keywords are reserved for special use, and they cant be used as identifiers. Once we have declared an identifier, it can be used later in the program.

While declaring identifiers, certain rules are to be followed, which are as follow:

  • It can contain only letters, digits, and underscore. No special characters.
  • It can only begin with either a letter or underscore.
  • The identifier can not be the same as any keyword.
  • White spaces are not allowed.
  • It should be 31 characters long as only the first 31 characters are significant.

Also read, odd or even program in c

Keywords

These are the tokens in the C programming language which are pre-defined, that is, they are reserved words. Each keyword has a special functionality associated with it. As they are referred names for compiler, they can not be used as variable names, as by doing so, we try to assign a new meaning to the keyword, which is not allowed in the C programming language. The text to be substituted for keywords can be specified before compilation using the C preprocessor directives.

There are 32 keywords in C language, which are as follows:
auto: It is used to declare a variable a local variable, memory is allocated and freed automatically. 

  • break, continue: break is used to bring the program control out of the loop. Continue is used to pass the control to the next iteration of the nearest enclosing do, for, or while statement.
  • switch, case, default: It is used when only one block of statements is to be executed from many blocks.
  • char: It is short for the character, a data type that holds a character.
  • const: It is used to specify that a variable’s value is constant and cannot be modified once initialized.
  • double, float: They are data types, double has twice the precision as compared to float.
  • if, else: Used for decision control programming structure.
  • enum: It is used to declare an enumeration.
  • extern: It is used to declare a variable or function with linkage outside of the file it is declared in.
  • for, while, do : A type of loop.
  • goto : It is used to transfer the control of the program to a specified label.
  • Int: It is short for integer, a data type that stores integer type value. 
  • register: It is used to tell the compiler to store the variable in the CPU register for faster access.
  • return, void: return is used to terminate the function and return the value. Void is one of the return type.
  • short, signed, unsigned, long: It is a type modifier that is used to alter the meaning of the base data type to yield a new type.
  • sizeof: It is used to evaluate the size of a variable.
  • static : It is used to create a static variable, the value of the static variable persists till the end of the program.
  • struct: It is used for declaring a structure.
  • typedef : It is used to explicitly associate a type with an identifier.
  • union: Collection of variables sharing the same memory location.
  • volatile: It is used to create volatile objects which can be modified by hardware in an unspecified way.

Constants

These are the tokens that are like normal variables, with the only difference being that their values cannot be changed or modified once defined. Constants refer to fixed values. They are also known as literals.

Types of Constants

  1. Integer Constants: E.g. 1241
  2. Real/Floating-point constants: E.g. 21.023
  3. Octal and Hexadecimal Constants: E.g. Octal: (013 )8 = (11)10, Hexadecimal: (013)16= (19)10.
  4. Character Constants: E.g. ‘x’
  5. String Constants: E.g. “hello”

Syntax

const dataType variableName

Strings

A string is basically an array of characters that ends with a null (‘\0’) character. The null character is used to indicate the end of a string. Strings are always enclosed in double-quotes, whereas a character is enclosed in single quotes.

E.g., char str[]=”hello”, when we declare a char as str[], memory space will be allocated as per the need during the program execution.

Try it on the online compiler.

Operators

These are the tokens that trigger an action when applied to objects and variables in the C programming language. Data items on which these tokens act upon are called operands. There are three types of operators depending on the number of operands the operator can act upon, which are as follows:

  • Unary Operator 
    • These are the operators which require only one operand to act upon. Increment and decrement operators are some of the examples of unary operators.
  • Binary Operator 
    • These are those operators that require two operands to act upon. They are further classified into 5 types: Arithmetic operators, Relational operators, Logical operators, Assignment operators, Bitwise operators.
  • Ternary Operator 
    • These are the operators that require three operands to act upon. The conditional operator (?) is an example of a ternary operator.

Special Symbols

There are some special symbols that are considered as tokens in the C language, they have some special meaning and cannot be used for any other purpose. Some of the special symbols are as follows:

  • Braces {}: The opening and closing curly braces are used to indicate the starting and the ending of the block of code, which contains more than one executable statement.
  • Brackets []: They are used for array element reference.
  • Parentheses (): They are used to indicate function calls and parameters passed to functions.
  • Semicolon ‘;’: They are used to indicate the end of one logical entity, also called statement terminator. Every statement must end with a semicolon in C language.
  • Asterisk *: It is used to create pointer variables and is also used as an operator for multiplying variables.
  • Assignment Operator =: It is used to assign values and for logical operation validation.
  • Pre-processor #: It is a macro processor which is used by the compiler to transform the program before actual compilation automatically. 

Classification of Tokens

Tokens in C programming are classified into several categories based on their roles and syntax within the language:

  • Keywords: Reserved words with predefined meanings (e.g., int, if, while).
  • Identifiers: User-defined names for variables, functions, and other entities.
  • Constants: Fixed values that do not change during program execution (e.g., 42, 'A', "Hello").
  • String Literals: Sequences of characters enclosed in double quotes (e.g., "C programming").
  • Operators: Symbols that perform operations on operands (e.g., +, -, *, /).
  • Punctuators: Special symbols with specific meanings in C syntax (e.g., (), {}, ,, ;).

Understanding these categories helps in interpreting and writing C programs effectively.

Rules for naming identifiers

In C programming, identifiers (names for variables, functions, etc.) must adhere to specific rules:

  • Must begin with a letter (a-z, A-Z) or an underscore (_).
  • Subsequent characters can be letters, digits (0-9), or underscores.
  • Cannot be a keyword or a predefined identifier.
  • Case-sensitive (MyVar is different from myvar).

These rules ensure clarity and consistency in naming conventions across C programs.

Tokens and Expressions

In C, tokens combine to form expressions that evaluate to a single value. For example:

int a = 10;
int b = 5;
int sum = a + b;

Here, a, b, 10, 5, +, and = are tokens. The expression a + b combines the tokens a, +, and b to compute the sum.

Tokenization Process

Tokenization in C involves breaking down source code into tokens. For example:

#include <stdio.h>

int main() {
    int num = 42;
    printf("The number is: %d\n", num);
    return 0;
}

In this code snippet:

  • Keywords: #include, int, main, return
  • Identifiers: num, printf
  • Constants: 42
  • String Literal: "The number is: %d\n"
  • Operators: =, ,, (), {}, ;

Each line of code is tokenized to facilitate compilation and execution by the C compiler.

Examples to Implement C Tokens

Example 1: Basic Arithmetic

#include <stdio.h>

int main() {
    int a = 10;
    int b = 5;
    int sum = a + b;
    printf("Sum of %d and %d is %d\n", a, b, sum);
    return 0;
}

 

Tokens: #include, <stdio.h>, int, main, (), {}, ;, =, +, ,, printf, "Sum of %d and %d is %d\n", return, 0

Example 2: Conditional Statement

#include <stdio.h>

int main() {
    int num = 42;
    
    if (num > 0) {
        printf("%d is positive\n", num);
    } else {
        printf("%d is non-positive\n", num);
    }
    
    return 0;
}

 

Tokens: #include, <stdio.h>, int, main, (), {}, ;, =, if, >, else, printf, "%d is positive\n", "%d is non-positive\n", return, 0

These examples illustrate how tokens are used in C programming to perform basic arithmetic operations and conditional statements, demonstrating their essential role in defining program logic and behavior.

Also see, Tribonacci Series 

Frequently Asked Questions

Give examples of valid and invalid identifier names.

Some of the names of the valid identifiers are count, frequency_2, etc., whereas some of the names of the invalid identifiers are 7count, 2frequency, etc.

What is the use of the break keyword?

The break is a keyword in the C programming language which is used to bring the program control out of the loop, it is used inside loops or switch statements.

What is the difference between a character constant and a string constant?

Character constants are written by enclosing a character within a pair of single quotes and are assigned to variables of type char, whereas string constants are written by enclosing a set of characters within a pair of double quotes and are assigned to variables of type String.

Conclusion

In this blog, we have covered the Tokens in C. Tokens form the fundamental elements of C programming, defining its syntax and structure. Each token plays a crucial role in expressing program logic and functionality, from keywords and identifiers to constants, operators, and punctuators. Understanding and correctly utilizing tokens is essential for writing efficient, error-free C programs.

C has many Data structures to offer, which are used for various purposes like arrays for storing continuous data, strings to represent a sequence of characters and many more.

To write a well-structured code, knowledge of functionsstructures, and different data types is very important.

Live masterclass