Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
The built-in MySQL full-text parser uses white space to determine the beginning and end of words. The full-text parser has a limitation in ideographic languages such as Chinese, Japanese, and Korean because these ideographic languages do not use word delimiters.
MySQL provided the ngram full-text parser to address this issue. Since MySQL version 5.7.6, the ngram full-text parser has been included as a built-in server plugin, which means that MySQL automatically loads this plugin when the MySQL database server starts. MySQL's InnoDB and MyISAM storage engines both support the ngram full-text parser.
An ngram is defined as a contiguous sequence of several characters from a text sequence. The primary function of the ngram full-text parser is to tokenize a text sequence into a contiguous sequence of n characters.
The following example show how the ngram full-text parser tokenizes a sequence of text for different values of n:
n = 1: 'm','y','s','q','l'
n = 2: 'my', 'ys', 'sq','ql'
n = 3: 'mys', 'ysq', 'sql'
n = 4: 'mysq', 'ysql'
n = 5: 'mysql'
Configuring ngram Token Size
By default, the token size for ngram is 2. To change the token size, use the ngram_token_sizeconfiguration option. The range of values is 1to 10.
Note that smaller token sizes allow faster searches with smaller full-text search indexes.
Because ngram_token_sizeis a read-only variable, you can only set its value using two options:
The first way, in the startup string:
mysqld --ngram_token_size=1
Second way - in config file:
[mysqld]
ngram_token_size=1
How to create FULLTEXT Index using the ngram Parser
When using CREATE TABLE, ALTER TABLE, or CREATE INDEX, specify "WITH PARSER ngram” to create a FULLTEXT index that uses the ngram parser.
For example, the below statement creates a new posts table and adds title and body columns to the index using ngram full-text parser.
USE testdb;
CREATE TABLE posts (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(255),
body TEXT,
FULLTEXT ( title, body ) WITH PARSER NGRAM
) ENGINE=INNODB CHARACTER SET UTF8mb4;
The following INSERT statement posts insert a new row into the table:
SET NAMES utf8;
INSERT INTO posts(title, body)
VALUES('MySQL全文搜索', 'MySQL提供了具有许多好的功能的内置全文搜索'),
('MySQL教程', '学习MySQL快速,简单和有趣');
Note that the SET NAMES statement sets the character set that the client and server will use to send and receive data; in this example, it uses utf8.
To see ngram how the text is marked up, use the following statement:
SET GLOBAL innodb_ft_aux_table="testdb/posts";
SELECT
*
FROM
information_schema.innodb_ft_index_cache
ORDER BY doc_id , position;
ngram Parser Space Handling
The ngram parser eliminates spaces when parsing. For example:
“ab cd” is parsed to “ab”, “cd”
“a bc” is parsed to “bc”
ngram Parser Phrase Search
Searches for phrases are converted to ngram phrase searches. For instance, the search phrase "abc" is converted to "ab bc," which returns documents that contain both "abc" and "ab bc."
The search phrase "abc def" is converted to "ab bc de ef," which returns documents that include both "abc def" and "ab bc de ef." A document containing the string "abcdef" is not returned.
The following example shows posts searching for phrases in a table: 搜索:
SELECT
id, title, body
FROM
posts
WHERE
MATCH (title , body) AGAINST ('搜索' );
Execute the above query statement and get the following results −
In natural language pattern search, search terms are transformed into unions of ngram values. Assuming the token size is 2or binary, the search terms "mysql" are converted to my ys sq and ql.
SELECT
*
FROM
posts
WHERE
MATCH (title, body) AGAINST ('简单和有趣' IN natural language MODE);
Execute the above query statement and get the following results −
The ngram FULLTEXTindex only contains ngram, so it doesn't know the start of the phrase. When performing a wildcard search, unexpected results may be returned.
The following rules will apply to FULLTEXTwildcard searches using the ngram search index:
If the wildcard prefix phrase is shorter than the ngram token size, the query returns all documents containing a token beginning with the prefix term ngram. For example:
SELECT
id,title,body
FROM
posts
WHERE
MATCH (title , body) AGAINST ('my*' );
Execute the above query statement and get the following results −
If the prefix phrase in the wildcard is longer than the ngram token size, MySQL converts the prefix term to an ngram phrase and ignores the wildcard operator.
FAQs
What is an ngram? In the full-text index, an n-gram is a sequence of n consecutive words in a text.
What is the default token size for an ngram, and how to change it? The default token size for an ngram is 2. To change the token size, use the ngram_token_size configuration option. The range of values is 1to 10.
How does ngram handles space? The ngram parser eliminates spaces when parsing. For example: “ab cd” is parsed to “ab”, “cd” “a bc” is parsed to “bc”
Key Takeaways
In this blog, we have learned about MySQL ngram Full-Text Parser and have seen how to create a full-text ngram Parser, configure token size in it, handle space, phrase search, and process the results in natural mode, boolean mode, and wildcard search.
Also, try Coding Ninjas Studio to practice programming problems for your complete interview preparation. Don't stop here, Ninja; check out the Top 100 SQL Problems to get hands-on experience with frequently asked interview questions and land your dream job.
Live masterclass
ChatGPT Google Search Architecture – Explained by Amazon SDE2
by Anubhav Sinha
09 Jun, 2025
01:30 PM
Amazon Data Analyst roadmap – From Resume to Interview
by Abhishek Soni
10 Jun, 2025
01:30 PM
Build AI Interview Q&A Generator using LLM
by Shantanu Shubham
11 Jun, 2025
01:30 PM
JioHotstar Sports Analytics using python: IPL Dataset
by Ashwin Goyal
12 Jun, 2025
01:30 PM
ChatGPT Google Search Architecture – Explained by Amazon SDE2
by Anubhav Sinha
09 Jun, 2025
01:30 PM
Amazon Data Analyst roadmap – From Resume to Interview