16 Commits

Author SHA1 Message Date
Kishore Nallan
57ac561743 Handle special characters in locale tokenization. 2022-08-18 10:47:30 +05:30
Kishore Nallan
6729b72b1a Normalize thai text via nfkc. 2022-08-04 20:32:54 +05:30
Kishore Nallan
07d838e385 Make symbols for indexing and segmentation configurable. 2021-08-26 10:27:18 +05:30
Kishore Nallan
e695ba65c8 Add a few locale tokenization tests. 2021-06-09 20:20:01 +05:30
Kishore Nallan
8726e27718 Support Chinese locale. 2021-06-06 22:03:02 +05:30
Kishore Nallan
56d3a26cc5 Imporve prefix searching on ko locale. 2021-05-31 19:47:12 +05:30
Kishore Nallan
8d6742fc6d Normalize ascii tokens intermixed with non-english text. 2021-05-28 14:04:10 +05:30
Kishore Nallan
b3b47f5651 Refactor highlighting + tokenizer to simplify logic. 2021-04-18 20:37:58 +05:30
Kishore Nallan
1d1712f391 Refactor tokenizer to use index, skip and separate logic. 2021-04-16 17:55:52 +05:30
kishorenc
3a92685967 Integrate with Kakasi. 2021-04-05 12:25:50 +05:30
kishorenc
dd72e2a78c Introduce field level locale. 2021-04-02 21:28:49 +05:30
kishorenc
c2eec85277 Fix highlighting of strings with special characters. 2021-03-20 12:58:30 +05:30
kishorenc
f501b137b7 Tokenize on special characters. 2021-03-16 11:39:53 +05:30
kishorenc
a912a250ff Fix bad unicode characters in highlight snippet. 2020-12-28 19:19:59 +05:30
kishorenc
9533b73609 Fixed a few higlighting/splitting edge cases. 2020-11-17 20:10:34 +05:30
kishorenc
6997e35f72 Combine various token operations in a single flow.
Splitting, normalizing etc. are now done in a single loop.
2020-11-17 20:10:34 +05:30