Table of contents
1.
Problem Statement
1.1.
Input
1.2.
Output
1.3.
Explanation
2.
Method 1: Using HashSet
2.1.
Algorithm
2.2.
Program
2.3.
Input
2.4.
Output
2.5.
Complexity Analysis
3.
Method 2: Trie with Depth-First Search
3.1.
Structure of Trie Node
3.2.
Functional Requirements
3.2.1.
Algorithm
3.2.2.
Algorithm
3.3.
Program
3.4.
Input
3.5.
Output
3.6.
Complexity Analysis
4.
Key Takeaways
Last Updated: Mar 27, 2024

Longest word in dictionary

Author Pranav Gautam
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Problem Statement

Given a list of words ‘WORDLIST’, find the longest word such that all the possible prefixes of that word are present in ‘WORDLIST’.

If more than one word meets the criteria, return the word with the smallest lexicographical order. You can return an empty string if no such word is possible for the input.

Input

Enter number of words: 6
w   wo   wor   wow   worm   work

Output

Longest word in dictionary: work 

Explanation

Both “worm” and “work” are the longest words present in ‘WORDLIST’ that have all possible prefixes in the list. As “work” is lexicographically smaller than “worm”, “work” is returned as an answer.

 

Method 1: Using HashSet

Store all the words in a HashSet. HashSet offers O(1) lookup time. Once all the words are stored in HashSet, we can perform a linear traversal through the ‘WORDLIST’. 

For each word, all the possible prefixes can be generated. If all the prefixes of the current word are present in the ‘WORDLIST’ and it is the longest yet discovered, we can set it as the answer. If an answer with equal length is already present, check the lexicographical order between the answer and the current word.

An optimized approach HashSetHashSete to sort the ‘WORDLIST’ first by length and lexicographic order of words.

Algorithm

  • Sort the words according to their length and lexicographical order.
  • Create a ‘HASH_SET’ to store the 'WORDLIST' for faster lookup.
  • For ‘CURRENT_STRING’ from beginning to end of the ‘WORDLIST’, do:
    • Insert ‘CURRENT_STRING’  into ‘HASH_SET’.
  • For ‘CURRENT_STRING’ from beginning to end of the ‘WORDLIST’, do:
    • Initialize  ‘CURRENT_PREFIX’ as an empty string.
    • Initialize the ‘ALL_PREFIXES_PRESENT’ boolean with true.
    • For ‘CURRENT_CHAR’ from beginning to end of the ‘CURRENT_STRING’, do:
      • Add ‘CURRENT_CHAR’ to ‘CURRENT_PREFIX’.
      • If ‘CURRENT_PREFIX’ is not present in ‘HASH_SET’, then:
        • Set ‘ALL_PREFIXES_PRESENT’ = false.
        • Break the loop.
    • If ‘ALL_PREFIXES_PRESENT’ is true, then:
      • Return ‘ALL_PREFIXES_PRESENT’.
  • Return an empty string.

Program

#include <iostream>
#include <vector>
#include <algorithm>
#include <unordered_set>
using namespace std;

bool comparator(string &a, string &b)
{

    // If both strings have equal length, return the one which comes first in lexicographical order.
    if (a.length() == b.length())
        return a < b;

    // If string lengths are not equal return the longest.
    return a.length() > b.length();
}

string longestWord(vector<string> &wordlist)
{

    // Sorting the words according to their length and lexicographical order.
    sort(wordlist.begin(), wordlist.end(), comparator);

    // Creating a hashset to store the 'WORDLIST' for faster lookup.
    unordered_set<string> hashSet;

    // Inserting the words from 'WORDLIST' into hashSet.
    for (string currentString : wordlist)
        hashSet.insert(currentString);

    // Considering each word from the wordlist to check all possible prefixes in the hashSet.
    for (string currentString : wordlist)
    {

        // Creating temporary storage to push letters from 'CURRENT_STRING' one by one.
        string currentPrefix = "";

        // Boolean variable to represent if there is any prefix which is not present in the 'WORDLIST'.
        bool allPrefixesPresent = true;

        // Taking characters sequentially to add them for creating prefixes.
        for (char currentChar : currentString)
        {
            currentPrefix.push_back(currentChar);

            // If current prefix  is not present in the 'WORDLIST', the 'CURRENT_STRING' can not be considered as answer.
            if (hashSet.find(currentPrefix) == hashSet.end())
            {
                allPrefixesPresent = false;
                break;
            }
        }

        // Checking if all the prefixes present for the 'CURRENT_STRING'.
        // If yes, this is our final answer.
        if (allPrefixesPresent)
            return currentString;
    }

    // Returning an empty string in case no word meets the criteria.
    return "";
}

int main()
{
    int numWords;
    cout << "Enter number of words: ";
    cin >> numWords;
    vector<string> wordlist(numWords);
    for (int i = 0; i < numWords; i++)
    {
        cin >> wordlist[i];
    }
    cout << "Longest word in dictionary: ";
    cout << longestWord(wordlist);
}
You can also try this code with Online C++ Compiler
Run Code

Input

Enter number of words: 6
w   wo   wor   wow   worm   work

Output

Longest word in dictionary: work 

Complexity Analysis

Time Complexity: Checking each prefix requires a linear-time operation. For a word with the length ‘W’, the maximum length of a prefix is ‘W’. The time complexity to check the maximum length prefix in the set is O(W). For the word ‘W’ in the ‘WORDLIST’, all the possible prefixes are checked. So, the time complexity to check all prefixes of a word is O(W2). To check prefixes of all the words, the time complexity is O(∑Wi2), where Wi represents each word from the ‘WORDLIST’.
 

Space Complexity: To create all prefixes of a word, extra space is required. So, the space complexity is O(∑Wi2), where Wi represents each word from the ‘WORDLIST’.

 

Method 2: Trie with Depth-First Search

A prefix can store all the possible prefixes in a prefix tree. Consider the word “work”. The prefix tree for “work” will look like the figure given below.

Let’s insert other words from ‘WORDLIST’ in the prefix tree. The resulting tree is given below.

Prefix trees are stored using a trie. Additional information like the end of the word and index of the word is stored in a trie.

 

Structure of Trie Node

Structure the trie node according to the data required from each node of the trie. Every trie node must contain two data members: ‘INDEX’ and ‘NEXT’.

  • ‘INDEX’ is an integer. It represents the index of the word(created from root to the current trie) in the ‘WORDLIST’. For a node that is not an end of any word, ‘INDEX’ is set to -1.
  • ‘NEXT’ is a hashmap. It stores the address of a trie node corresponding to a character.

 

Insert all the words in the trie structure. A depth-first search can look for the longest prefix possible. 

Functional Requirements

To use a DFS Algorithm on a trie structure, we require INSERT() and DFS(). Let’s describe these functions and their algorithms.

  • INSERT(): INSERT() function inserts the words with their indices in the trie. For an INSERT() function, two arguments are required: word and index. Start with the ‘ROOT ’ of the trie as ‘CURRENT_NODE’. Iterate through all the letters of the word. If the current character’s ‘NEXT’ pointer is not pointing to a trie node, create one. Otherwise, move the pointer to the ‘CURRENT_NODE’ to the ‘NEXT’ pointer. At the end of the word, the ‘CURRENT_NODE’ should have the ‘INDEX’of the word inserted.
     

Algorithm

  • Set ‘CURRENT_NODE’ equal to ‘ROOT’.
  • Iterate the ‘LETTER’ variable from beginning to end of the word and perform the following operations:
    • If CURRENT_NODE -> NEXT[LETTER] is pointing to NULL, then:
      • Create a node at CURRENT_NODE -> NEXT[LETTER].
    • Set ‘CURRENT_NODE’ equal to CURRENT_NODE -> NEXT[LETTER].
  • Set CURRENT_NODE->INDEX equal to ‘INDEX’ (‘INDEX’ represents the index of the current word).

 

  • DFS():  The DFS() function does a depth-first search on the trie. For every current node(‘CURRENT_NODE’ ) in our method, we store its next possible characters(CURRENT_NODE.NEXT) in a stack.

 

Every node has an index(CURRENT_NODE.INDEX), representing the index of the word containing characters from ‘ROOT’ to ‘CURRENT_NODE’. If the word is longest and lexicographically smallest discovered so far, consider it an answer(‘ANSWER’) for the path traveled so far. Consider the top node(‘TOP_NODE’) of the stack as the new current node(CURRENT_NODE.NEXT) and repeat the process.
 

Algorithm

  • Initialize the ‘ANSWER’ variable as an empty string.
  • Initialize a stack ‘CURRENT_STACK’ with root(‘ROOT’) of the trie(to store all the next nodes to the current node).
  • While ‘CURRENT_STACK’ is not empty, do:
    • Set ‘CURRENT_NODE’ as the top element of the ‘CURRENT_STACK’ (to do dfs on the ‘CURRENT_NODE’ ).

 

Program

#include <iostream>
#include <vector>
#include <stack>
#include <unordered_map>
using namespace std;

// Creating a Node class to store Nodes of a trie.
class Node
{

  public:

  // 'NEXT' stores the address of a node corresponding to a character.
  unordered_map<char, Node *> next;

  // 'INDEX' represents the index of the word.
  int index;

  // Constructor to create a node with index as -1.
  Node()
  {
    index = -1;
  }
};

// Creating a Trie class to create a trie data structure.
class Trie
{

  // 'ROOT' refers to the root node of the trie.
  Node *root;

  // 'WORDLIST' is stored as data member for lookup in dfs function.
  vector<string> wordlist;

  /* 'INSERT' function to store all the words 
      and their indices in the trie data structure. */
  void insert(string word, int index)
  {

    // Set 'CURRENT_NODE' equal to 'ROOT'.
    Node *currentNode = root;

    /* Iterate the 'LETTER' variable from beginning to end of the word
       to insert all the letters of word in the trie. */
    for (char letter : word)
    {

      // If character node is not present in the trie, create one.
      if (currentNode->next[letter] == NULL)
      {
        currentNode->next[letter] = new Node();
      }

      // Move 'CURRENT_NODE' to next letter.
      currentNode = currentNode->next[letter];
    }

    // Set index of word at the last letter node.
    currentNode->index = index;
  }

  public:
  // Constructor for a trie data structure to fill it with given input.
  Trie(vector<string> wordlist)
  {

    // Set 'ROOT' pointer pointing to this trie.
    this->root = new Node();

    // Store 'WORDLIST' in trie for lookup in dfs.
    this->wordlist = wordlist;

    // 'LENGTH' represents number of words in the 'WORDLIST'.
    int length = wordlist.size();

    // Iterate through 'WORDLIST'.
    for (int index = 0; index < length; index++)
    {

      // Insert current word and its index in trie.
      insert(wordlist[index], index);
    }
  }

  string dfs()
  {

    // Stack to store all the next nodes to the current node.
    stack<Node *> currentStack;

    // Initializing root of this trie in the stack.
    currentStack.push(root);

    /* Initialize the 'ANSWER' variable as an empty string. 
       This represents the final answer to be returned. */
    string answer = "";

    // 'CURRENT_WORD' represents the current word in the 'WORDLIST' dfs state refers to.
    string currentWord;
    while (!currentStack.empty())
    {

      // 'CURRENT_NODE' represents the node that is being explored for dfs.
      Node *currentNode = currentStack.top();

      // As 'CURRENT_NODE' is being explored, remove it from the stack.
      currentStack.pop();

      /* CURRENT_NODE index is -1 means this word is not present in
         'WORDLIST'. So, it should not be explored for dfs.
        For  CURRENT_NODE = 'ROOT', 
         we need to explore it's next nodes.*/
      if (currentNode->index != -1 || currentNode == root)
      {

        // If 'CURRENT_NODE' is root then it should not be considered for answer.
        if (currentNode != root)
          currentWord = wordlist[currentNode->index];

        /* If current word is bigger than the answer discovered so far,
           it can be the new answer. If both the lengths are equal,
           we should set current word as answer 
           if it is lexicographically smaller. */
        if (currentWord.length() > answer.length())
          answer = currentWord;
        else if (currentWord.length() == answer.length() && currentWord < answer)
        {
          answer = currentWord;
        }

        // Exploring next nodes of 'CURRENT_NODE'.
        for (auto nextNode : currentNode->next)
        {
          currentStack.push(nextNode.second);
        }
      }
    }
    return answer;
  }
};
int main()
{

  int numWords;
  cout << "Enter number of words: ";
  cin >> numWords;

  // 'WORDLIST' vector to store all the words as input.
  vector<string> wordlist(numWords);

  cout << "Enter words:\n";
  for (int i = 0; i < numWords; i++)
  {
    cin >> wordlist[i];
  }

  // Creating a new trie with given 'WORDLIST'.
  Trie *currentTrie = new Trie(wordlist);

  cout << "Longest word in dictionary: ";
  cout << currentTrie->dfs();
}
You can also try this code with Online C++ Compiler
Run Code

Input

Enter number of words: 6
w   wo   wor   wow   worm   work

Output

Longest word in dictionary: work 

Complexity Analysis

Time Complexity: In the worst case, we may need to traverse through the whole trie data structure. The trie data structure contains all the letters of al the words. So, the time complexity of this method is O(⅀Wi), where Wrepresents the length of the ith word in ‘WORDLIST’.

Space Complexity: An additional space is required to store all the  words in the trie data structure. So, the space complexity is O(⅀Wi), where Wrepresents the length of the ith word in ‘WORDLIST’.

 

Key Takeaways

Congratulations! You have successfully learned to find the longest word in dictionary efficiently. Learning something new is never easy. Practicing what you’ve learned is even more difficult. But, a good coder never gives up. So, head over to our practice platform Coding Ninjas Studio to practice top problems and many more. Coding Ninjas Studio also offers interesting interview experiences and blogs like this. So keep learning and Happy Coding!

Live masterclass