475 Commits

Author SHA1 Message Date
Kishore Nallan
32cd67c9d1 The ART will store the frequency count in addition to the score.
In certain cases, the ability to identify the most similar tokens based on the popularity of the token is useful.
2016-06-08 22:18:52 +05:30
Kishore Nallan
bb0e7aefb9 Rename score to max_score for internal node and leaf structs. 2016-06-08 11:26:52 +05:30
Kishore Nallan
04d02919b2 Fix memory corruption during unsorted append. 2016-06-04 19:05:47 +05:30
Kishore Nallan
c029e620d9 Clean up the match scoring logic.
Added more comments to illustrate what's happening.
2016-05-31 19:03:40 +05:30
Kishore Nallan
0f756efe74 Fix sorting - should be in ascending order. 2016-05-30 20:13:44 +05:30
Kishore Nallan
3dde71e72e Unsorted append to forarray. 2016-05-30 20:11:26 +05:30
Kishore Nallan
383212be46 Fix bugs in top-K implementation. 2016-05-15 09:01:05 +05:30
Kishore Nallan
884a83f53c Use lower bound search to implement indexOf() 2016-05-15 09:00:42 +05:30
Kishore Nallan
c667ed5d10 Fix static linking with libfor. 2016-04-25 21:51:02 +05:30
Kishore Nallan
f0f57f2e2d Saving state. 2016-03-23 07:38:43 +05:30
Kishore Nallan
566c4ce666 Intersection of documents across the search tokens. 2016-02-29 19:47:05 +05:30
Kishore Nallan
47df6201b1 Append offset related fields to the art leaf during insertion. 2016-02-28 21:13:54 +05:30
Kishore Nallan
0ba5c4874f Parameterized the number of fuzzy matches that are returned for words with typo. 2016-02-21 19:51:57 +05:30
Kishore Nallan
b88241d9e9 Bug fix: word suggestions were not showing up sorted on their document scores.
Somehow, std::max() on uint16_t does not seem to work. Using a MAX macro.
2016-02-21 19:21:20 +05:30
Kishore Nallan
8ff75e481d Replace callbacks with a result vector.
Document IDs for the given search token will be populated into this result vector.
2016-02-20 23:14:17 +05:30
Kishore Nallan
1ffe38b5c8 Grow the forarray properly depending on the data stored. 2016-02-20 23:12:55 +05:30
Kishore Nallan
cb3b0e1a6e Using a proper document struct when representing leaf values.
Removed experimental submodules. Only using `for` now (compressed array).
2016-01-31 11:20:07 +05:30
Kishore Nallan
ee77fb4d22 Add 2 more external dependencies via git submodule. 2016-01-24 14:35:40 +05:30
Kishore Nallan
c095c166f0 Adding external dependencies. 2016-01-17 19:11:05 +05:30
Kishore Nallan
a662e43959 Top-K matches for a given substring seems to work. 2015-12-31 07:22:35 +05:30
Kishore Nallan
2dfc31a519 Sorting on popularity metric - WIP. Still has bugs. 2015-12-29 20:55:50 +05:30
Kishore Nallan
5246a1683d Adding a max_score field to intermediate nodes that denote the maximum score of lead nodes.
This is useful for pruning search space when we want to identify top-K matches for a given prefix.
2015-12-14 08:23:28 +05:30
Kishore Nallan
50a125f7ea Fixed a major bug with NODE256 iteration for prefix "twili". 2015-11-29 16:36:36 +05:30
Kishore Nallan
e4a2be3ac3 Rewriting fuzzy look-up using incremental levenshtein matrix. WIP. 2015-11-28 22:41:26 +05:30
Kishore Nallan
64f53b6420 Initial commit. Fuzzy prefix match works. 2015-11-10 19:44:44 +05:30