Kishore Nallan
12276b651f
Base work for supporting multiple indexable fields.
2016-12-22 22:26:33 +05:30
Kishore Nallan
9b0c347334
ART - integer range search.
2016-12-11 13:47:43 +05:30
Kishore Nallan
9cc3e7e5ea
Fixed a bug in pagination.
2016-11-27 21:30:13 +05:30
Kishore Nallan
e1526319f7
Building up support for prefix based searching and for ranking token suggestions by either frequency or max_score.
2016-11-27 14:56:15 +05:30
Kishore Nallan
db22d01b84
Added an ART search token cache.
...
To cache previous searches so that we don't repeatedly call ART search as we iterate through the correction.
2016-11-26 17:57:05 +05:30
Kishore Nallan
4e10fadeb7
Settle for partial matches when the whole query produces no results.
2016-11-26 17:13:16 +05:30
Kishore Nallan
396e10be5d
Refactor collection's search method to be more judicious in using higher costs.
...
Earlier, even if one token produced no result, ALL tokens were searched with a higher cost. This change ensures that we first retry only the token that did not produce results with a larger cost before doing the same for other tokens.
2016-11-24 21:39:20 +05:30
Kishore Nallan
5736888935
Tests for collection.
2016-11-13 21:59:32 +05:30
Kishore Nallan
18a4528540
Forarray tests.
2016-11-13 09:53:30 +05:30
Kishore Nallan
aab5912110
Fuzzy search tests.
2016-11-07 19:36:28 +05:30
Kishore Nallan
ee68da6f53
Build RocksDB and H2O also as part of the build process.
2016-10-21 09:18:13 +05:30
Kishore Nallan
c8eba7cf11
Adopt sequence ID as generated document ID, instead of using UUID.
2016-10-08 21:17:33 +05:30
Kishore Nallan
596430c036
Remove entry from rocksdb and art when required.
2016-10-05 21:24:40 +05:30
Kishore Nallan
ef105dcbd9
Reduce memory foot-print.
2016-10-04 21:31:55 +05:30
Kishore Nallan
d8eee0d04a
Util for logging exec time.
2016-10-02 19:11:59 +05:30
Kishore Nallan
9d5a120dab
Replace unordered_map with sparsepp hashmap. Much faster!
2016-09-27 22:03:41 +05:30
Kishore Nallan
080eceea79
Remove bit packing - use proper struct.
2016-09-27 20:53:38 +05:30
Kishore Nallan
5cd8b72d0b
Fixed a bug in top-K sorting.
2016-09-25 13:10:34 +05:30
Kishore Nallan
e777afc97f
API for removing a document from index.
2016-09-24 18:08:57 +05:30
Kishore Nallan
e7c6c6d3cb
Fixed multi word queries.
2016-09-12 14:25:07 +05:30
Kishore Nallan
c96a9d9b35
Adopt Damerau–Levenshtein distance, instead of plain Levenshtein.
...
Specifically, we use the optimal string alignment distance. It treats transposition as a cost of 1, rather than 2.
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Optimal_string_alignment_distance
2016-09-10 16:19:04 +05:30
Kishore Nallan
1a53d5692e
Fuzzy match rewrite - still need to work on matching perf.
2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5
For now, disable prefix matches to be considered as whole matches.
2016-09-02 10:34:34 +05:30
Kishore Nallan
44da808f16
RocksDB based persistence.
2016-08-28 22:04:58 +05:30
Kishore Nallan
4d2ba27cab
Release memory of value stored when art node is destroyed.
2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715
Fixed various issues flagged by Valgrind.
2016-08-27 13:44:53 +05:30
Kishore Nallan
1c09ec38a8
Removed redundant token_count field from the leaf.
2016-08-24 09:27:32 +05:30
Kishore Nallan
9b6547f050
Refactor index
to be called as collection
.
2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195
Add JSON dep.
2016-08-23 20:31:11 +05:30
Kishore Nallan
ba33da1d51
Lots of code clean up.
...
* Move stuff out of main to classes
* Standardize naming conventions.
2016-08-07 14:55:26 -07:00
Kishore Nallan
30cd057201
Split-up fuzzy lookup into separate stages.
...
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.
This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091
Calculation of hits for a token had a bug.
...
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
734640cd2a
Fix size calculation for unsorted append.
2016-06-09 17:28:50 +05:30
Kishore Nallan
32cd67c9d1
The ART will store the frequency count in addition to the score.
...
In certain cases, the ability to identify the most similar tokens based on the popularity of the token is useful.
2016-06-08 22:18:52 +05:30
Kishore Nallan
bb0e7aefb9
Rename score
to max_score
for internal node and leaf structs.
2016-06-08 11:26:52 +05:30
Kishore Nallan
04d02919b2
Fix memory corruption during unsorted append.
2016-06-04 19:05:47 +05:30
Kishore Nallan
c029e620d9
Clean up the match scoring logic.
...
Added more comments to illustrate what's happening.
2016-05-31 19:03:40 +05:30
Kishore Nallan
0f756efe74
Fix sorting - should be in ascending order.
2016-05-30 20:13:44 +05:30
Kishore Nallan
3dde71e72e
Unsorted append to forarray.
2016-05-30 20:11:26 +05:30
Kishore Nallan
383212be46
Fix bugs in top-K implementation.
2016-05-15 09:01:05 +05:30
Kishore Nallan
884a83f53c
Use lower bound search to implement indexOf()
2016-05-15 09:00:42 +05:30
Kishore Nallan
c667ed5d10
Fix static linking with libfor.
2016-04-25 21:51:02 +05:30
Kishore Nallan
f0f57f2e2d
Saving state.
2016-03-23 07:38:43 +05:30
Kishore Nallan
566c4ce666
Intersection of documents across the search tokens.
2016-02-29 19:47:05 +05:30
Kishore Nallan
47df6201b1
Append offset related fields to the art leaf during insertion.
2016-02-28 21:13:54 +05:30
Kishore Nallan
0ba5c4874f
Parameterized the number of fuzzy matches that are returned for words with typo.
2016-02-21 19:51:57 +05:30
Kishore Nallan
b88241d9e9
Bug fix: word suggestions were not showing up sorted on their document scores.
...
Somehow, std::max() on uint16_t does not seem to work. Using a MAX macro.
2016-02-21 19:21:20 +05:30
Kishore Nallan
8ff75e481d
Replace callbacks with a result vector.
...
Document IDs for the given search token will be populated into this result vector.
2016-02-20 23:14:17 +05:30
Kishore Nallan
1ffe38b5c8
Grow the forarray properly depending on the data stored.
2016-02-20 23:12:55 +05:30
Kishore Nallan
cb3b0e1a6e
Using a proper document struct when representing leaf values.
...
Removed experimental submodules. Only using `for` now (compressed array).
2016-01-31 11:20:07 +05:30