Kishore Nallan
8475cba007
Minor refactoring of collection manager.
2017-01-26 13:54:30 -06:00
Kishore Nallan
216ac7997a
Restore in-memory index on restart.
2017-01-24 22:13:49 -05:00
Kishore Nallan
a6cacf19d0
Return total number of results found in the API.
2017-01-22 21:39:46 +05:30
Kishore Nallan
da68fb17e8
Support LESS_THAN and GREATER_THAN.
2017-01-22 05:40:10 +05:30
Kishore Nallan
0fcdb6b479
Support signed ints in art int search.
2017-01-12 21:20:52 +05:30
Kishore Nallan
da263b69ea
Stable sort does not require key comparison.
2017-01-11 21:47:28 +05:30
Kishore Nallan
a67e8f4caa
Refactor - use std::tie for comparator.
2017-01-11 20:43:55 +05:30
Kishore Nallan
b7654baa74
Persist collection's next_seq_id.
2017-01-09 22:14:06 +05:30
Kishore Nallan
ce471c9bb1
Keep the hashset bounded by deleting the element to be replaced in heap from the set.
2017-01-09 19:17:53 +05:30
Kishore Nallan
3e8f9298a9
Remove redundant string conversion for collection_id.
2017-01-08 22:02:35 +05:30
Kishore Nallan
d831c49817
Move duplicate ID detection right inside topster.
2017-01-08 21:44:36 +05:30
Kishore Nallan
2f08eca12e
Initial sketch for persisting meta information about collections.
2017-01-08 19:47:17 +05:30
Kishore Nallan
2b6293650e
Search across multiple fields.
...
Need to write more tests.
2017-01-01 19:56:26 +05:30
Kishore Nallan
54a60398ab
Parameterize rank fields.
2016-12-29 21:45:38 +05:30
Kishore Nallan
473aa6d5f6
Basic test for topster.
2016-12-28 21:28:27 +05:30
Kishore Nallan
0b88e669f6
Make ART fuzzy_search take min_cost and max_cost instead of only max_cost.
2016-12-28 18:16:43 +05:30
Kishore Nallan
8aaa9b174f
Allow use of custom primary and secondary attributes for ranking.
2016-12-23 21:07:53 +05:30
Kishore Nallan
12276b651f
Base work for supporting multiple indexable fields.
2016-12-22 22:26:33 +05:30
Kishore Nallan
9b0c347334
ART - integer range search.
2016-12-11 13:47:43 +05:30
Kishore Nallan
9cc3e7e5ea
Fixed a bug in pagination.
2016-11-27 21:30:13 +05:30
Kishore Nallan
e1526319f7
Building up support for prefix based searching and for ranking token suggestions by either frequency or max_score.
2016-11-27 14:56:15 +05:30
Kishore Nallan
db22d01b84
Added an ART search token cache.
...
To cache previous searches so that we don't repeatedly call ART search as we iterate through the correction.
2016-11-26 17:57:05 +05:30
Kishore Nallan
4e10fadeb7
Settle for partial matches when the whole query produces no results.
2016-11-26 17:13:16 +05:30
Kishore Nallan
396e10be5d
Refactor collection's search method to be more judicious in using higher costs.
...
Earlier, even if one token produced no result, ALL tokens were searched with a higher cost. This change ensures that we first retry only the token that did not produce results with a larger cost before doing the same for other tokens.
2016-11-24 21:39:20 +05:30
Kishore Nallan
5736888935
Tests for collection.
2016-11-13 21:59:32 +05:30
Kishore Nallan
18a4528540
Forarray tests.
2016-11-13 09:53:30 +05:30
Kishore Nallan
aab5912110
Fuzzy search tests.
2016-11-07 19:36:28 +05:30
Kishore Nallan
ee68da6f53
Build RocksDB and H2O also as part of the build process.
2016-10-21 09:18:13 +05:30
Kishore Nallan
c8eba7cf11
Adopt sequence ID as generated document ID, instead of using UUID.
2016-10-08 21:17:33 +05:30
Kishore Nallan
596430c036
Remove entry from rocksdb and art when required.
2016-10-05 21:24:40 +05:30
Kishore Nallan
ef105dcbd9
Reduce memory foot-print.
2016-10-04 21:31:55 +05:30
Kishore Nallan
d8eee0d04a
Util for logging exec time.
2016-10-02 19:11:59 +05:30
Kishore Nallan
9d5a120dab
Replace unordered_map with sparsepp hashmap. Much faster!
2016-09-27 22:03:41 +05:30
Kishore Nallan
080eceea79
Remove bit packing - use proper struct.
2016-09-27 20:53:38 +05:30
Kishore Nallan
5cd8b72d0b
Fixed a bug in top-K sorting.
2016-09-25 13:10:34 +05:30
Kishore Nallan
e777afc97f
API for removing a document from index.
2016-09-24 18:08:57 +05:30
Kishore Nallan
e7c6c6d3cb
Fixed multi word queries.
2016-09-12 14:25:07 +05:30
Kishore Nallan
c96a9d9b35
Adopt Damerau–Levenshtein distance, instead of plain Levenshtein.
...
Specifically, we use the optimal string alignment distance. It treats transposition as a cost of 1, rather than 2.
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Optimal_string_alignment_distance
2016-09-10 16:19:04 +05:30
Kishore Nallan
1a53d5692e
Fuzzy match rewrite - still need to work on matching perf.
2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5
For now, disable prefix matches to be considered as whole matches.
2016-09-02 10:34:34 +05:30
Kishore Nallan
44da808f16
RocksDB based persistence.
2016-08-28 22:04:58 +05:30
Kishore Nallan
4d2ba27cab
Release memory of value stored when art node is destroyed.
2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715
Fixed various issues flagged by Valgrind.
2016-08-27 13:44:53 +05:30
Kishore Nallan
1c09ec38a8
Removed redundant token_count field from the leaf.
2016-08-24 09:27:32 +05:30
Kishore Nallan
9b6547f050
Refactor index
to be called as collection
.
2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195
Add JSON dep.
2016-08-23 20:31:11 +05:30
Kishore Nallan
ba33da1d51
Lots of code clean up.
...
* Move stuff out of main to classes
* Standardize naming conventions.
2016-08-07 14:55:26 -07:00
Kishore Nallan
30cd057201
Split-up fuzzy lookup into separate stages.
...
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.
This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091
Calculation of hits for a token had a bug.
...
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
734640cd2a
Fix size calculation for unsorted append.
2016-06-09 17:28:50 +05:30