kishorenc
d2a825799b
Make default sorting field optional.
2021-02-21 19:55:31 +05:30
Jason Bosco
0ad8c48115
Range operator for numerical filtering.
2021-01-29 13:57:52 -08:00
Jason Bosco
a08fb7738f
Exhaustive token searching with filter_ids
2021-01-22 20:06:18 -08:00
kishorenc
9c2782e93d
Use int64_t for default sorting field references.
2020-09-26 15:13:01 +05:30
kishorenc
aa9e4a226e
Filtering on string field should be verbatim by default.
...
Allow earlier "CONTAINS" behavior via "~" operator.
2020-09-06 16:25:39 +05:30
kishorenc
0c186481a9
Allow int64 to be used as a default sorting field.
2020-08-12 16:06:27 +05:30
Kishore Nallan
c3298ba6d8
Address -Wall and -Wextra warnings.
2018-01-25 20:08:13 +05:30
Kishore Nallan
78b9ee52ec
Make match score computation predictable and consistent across multiple indexes.
2017-11-12 22:31:29 +05:30
Kishore Nallan
e24e0fae5d
Node score should be a int32_t.
2017-09-21 19:40:41 +05:30
Kishore Nallan
a2f475d7fc
Enable ART to index and search on floating point numbers.
2017-08-09 18:17:26 -04:00
Kishore Nallan
b7bc974b8e
Expose token ranking field properly via the API.
2017-05-27 14:02:32 +05:30
Kishore Nallan
4776b41dc1
Facet implementation.
2017-03-13 21:09:27 +05:30
Kishore Nallan
b880cfd531
Refactor forarray - split into individual classes.
2017-02-04 16:27:07 +05:30
Kishore Nallan
da68fb17e8
Support LESS_THAN and GREATER_THAN.
2017-01-22 05:40:10 +05:30
Kishore Nallan
0fcdb6b479
Support signed ints in art int search.
2017-01-12 21:20:52 +05:30
Kishore Nallan
0b88e669f6
Make ART fuzzy_search take min_cost and max_cost instead of only max_cost.
2016-12-28 18:16:43 +05:30
Kishore Nallan
12276b651f
Base work for supporting multiple indexable fields.
2016-12-22 22:26:33 +05:30
Kishore Nallan
9b0c347334
ART - integer range search.
2016-12-11 13:47:43 +05:30
Kishore Nallan
9cc3e7e5ea
Fixed a bug in pagination.
2016-11-27 21:30:13 +05:30
Kishore Nallan
e1526319f7
Building up support for prefix based searching and for ranking token suggestions by either frequency or max_score.
2016-11-27 14:56:15 +05:30
Kishore Nallan
aab5912110
Fuzzy search tests.
2016-11-07 19:36:28 +05:30
Kishore Nallan
ef105dcbd9
Reduce memory foot-print.
2016-10-04 21:31:55 +05:30
Kishore Nallan
c96a9d9b35
Adopt Damerau–Levenshtein distance, instead of plain Levenshtein.
...
Specifically, we use the optimal string alignment distance. It treats transposition as a cost of 1, rather than 2.
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Optimal_string_alignment_distance
2016-09-10 16:19:04 +05:30
Kishore Nallan
1a53d5692e
Fuzzy match rewrite - still need to work on matching perf.
2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5
For now, disable prefix matches to be considered as whole matches.
2016-09-02 10:34:34 +05:30
Kishore Nallan
1c09ec38a8
Removed redundant token_count field from the leaf.
2016-08-24 09:27:32 +05:30
Kishore Nallan
30cd057201
Split-up fuzzy lookup into separate stages.
...
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.
This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091
Calculation of hits for a token had a bug.
...
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
32cd67c9d1
The ART will store the frequency count in addition to the score.
...
In certain cases, the ability to identify the most similar tokens based on the popularity of the token is useful.
2016-06-08 22:18:52 +05:30
Kishore Nallan
bb0e7aefb9
Rename score
to max_score
for internal node and leaf structs.
2016-06-08 11:26:52 +05:30
Kishore Nallan
47df6201b1
Append offset related fields to the art leaf during insertion.
2016-02-28 21:13:54 +05:30
Kishore Nallan
0ba5c4874f
Parameterized the number of fuzzy matches that are returned for words with typo.
2016-02-21 19:51:57 +05:30
Kishore Nallan
b88241d9e9
Bug fix: word suggestions were not showing up sorted on their document scores.
...
Somehow, std::max() on uint16_t does not seem to work. Using a MAX macro.
2016-02-21 19:21:20 +05:30
Kishore Nallan
8ff75e481d
Replace callbacks with a result vector.
...
Document IDs for the given search token will be populated into this result vector.
2016-02-20 23:14:17 +05:30
Kishore Nallan
cb3b0e1a6e
Using a proper document struct when representing leaf values.
...
Removed experimental submodules. Only using `for` now (compressed array).
2016-01-31 11:20:07 +05:30
Kishore Nallan
ee77fb4d22
Add 2 more external dependencies via git submodule.
2016-01-24 14:35:40 +05:30
Kishore Nallan
a662e43959
Top-K matches for a given substring seems to work.
2015-12-31 07:22:35 +05:30
Kishore Nallan
2dfc31a519
Sorting on popularity metric - WIP. Still has bugs.
2015-12-29 20:55:50 +05:30
Kishore Nallan
5246a1683d
Adding a max_score field to intermediate nodes that denote the maximum score of lead nodes.
...
This is useful for pruning search space when we want to identify top-K matches for a given prefix.
2015-12-14 08:23:28 +05:30
Kishore Nallan
50a125f7ea
Fixed a major bug with NODE256 iteration for prefix "twili".
2015-11-29 16:36:36 +05:30
Kishore Nallan
e4a2be3ac3
Rewriting fuzzy look-up using incremental levenshtein matrix. WIP.
2015-11-28 22:41:26 +05:30
Kishore Nallan
64f53b6420
Initial commit. Fuzzy prefix match works.
2015-11-10 19:44:44 +05:30