Introduction
RediSearch is a Redis module that adds queryability, secondary indexing, and full-text search to the database.
For Redis, RediSearch offers secondary indexing, full-text search, and a query language. These features enable multi-field queries, aggregation, exact phrase matching, and numeric filtering for text inquiries.
Chinese document can be added in redisearch from version 0.99.0.
Let’s learn how the Chinese documents are supported in RediSearch.
How are Chinese Documents Supported in RediSerach?
Chinese support enables Chinese documents to be added and tokenized using segmentation rather than conventional whitespace and punctuation tokenization.
Because of the way tokens are extracted, indexing a Chinese document differs from indexing a document in most other languages. While separating characters and whitespace can be used to separate tokens in other languages, this is not the case with Chinese. The most likely match (depending on the surrounding terms and characters) is determined by scanning the input text and checking every character or sequence against a dictionary of predefined phrases.
For this, RediSearch takes use of the Friso Chinese tokenization package. This is mostly invisible to the user, and no further configuration is frequently necessary.




