475 Commits

Author SHA1 Message Date
Kishore Nallan
8475cba007 Minor refactoring of collection manager. 2017-01-26 13:54:30 -06:00
Kishore Nallan
216ac7997a Restore in-memory index on restart. 2017-01-24 22:13:49 -05:00
Kishore Nallan
a6cacf19d0 Return total number of results found in the API. 2017-01-22 21:39:46 +05:30
Kishore Nallan
da68fb17e8 Support LESS_THAN and GREATER_THAN. 2017-01-22 05:40:10 +05:30
Kishore Nallan
0fcdb6b479 Support signed ints in art int search. 2017-01-12 21:20:52 +05:30
Kishore Nallan
da263b69ea Stable sort does not require key comparison. 2017-01-11 21:47:28 +05:30
Kishore Nallan
a67e8f4caa Refactor - use std::tie for comparator. 2017-01-11 20:43:55 +05:30
Kishore Nallan
b7654baa74 Persist collection's next_seq_id. 2017-01-09 22:14:06 +05:30
Kishore Nallan
ce471c9bb1 Keep the hashset bounded by deleting the element to be replaced in heap from the set. 2017-01-09 19:17:53 +05:30
Kishore Nallan
3e8f9298a9 Remove redundant string conversion for collection_id. 2017-01-08 22:02:35 +05:30
Kishore Nallan
d831c49817 Move duplicate ID detection right inside topster. 2017-01-08 21:44:36 +05:30
Kishore Nallan
2f08eca12e Initial sketch for persisting meta information about collections. 2017-01-08 19:47:17 +05:30
Kishore Nallan
2b6293650e Search across multiple fields.
Need to write more tests.
2017-01-01 19:56:26 +05:30
Kishore Nallan
54a60398ab Parameterize rank fields. 2016-12-29 21:45:38 +05:30
Kishore Nallan
473aa6d5f6 Basic test for topster. 2016-12-28 21:28:27 +05:30
Kishore Nallan
0b88e669f6 Make ART fuzzy_search take min_cost and max_cost instead of only max_cost. 2016-12-28 18:16:43 +05:30
Kishore Nallan
8aaa9b174f Allow use of custom primary and secondary attributes for ranking. 2016-12-23 21:07:53 +05:30
Kishore Nallan
12276b651f Base work for supporting multiple indexable fields. 2016-12-22 22:26:33 +05:30
Kishore Nallan
9b0c347334 ART - integer range search. 2016-12-11 13:47:43 +05:30
Kishore Nallan
9cc3e7e5ea Fixed a bug in pagination. 2016-11-27 21:30:13 +05:30
Kishore Nallan
e1526319f7 Building up support for prefix based searching and for ranking token suggestions by either frequency or max_score. 2016-11-27 14:56:15 +05:30
Kishore Nallan
db22d01b84 Added an ART search token cache.
To cache previous searches so that we don't repeatedly call ART search as we iterate through the correction.
2016-11-26 17:57:05 +05:30
Kishore Nallan
4e10fadeb7 Settle for partial matches when the whole query produces no results. 2016-11-26 17:13:16 +05:30
Kishore Nallan
396e10be5d Refactor collection's search method to be more judicious in using higher costs.
Earlier, even if one token produced no result, ALL tokens were searched with a higher cost. This change ensures that we first retry only the token that did not produce results with a larger cost before doing the same for other tokens.
2016-11-24 21:39:20 +05:30
Kishore Nallan
5736888935 Tests for collection. 2016-11-13 21:59:32 +05:30
Kishore Nallan
18a4528540 Forarray tests. 2016-11-13 09:53:30 +05:30
Kishore Nallan
aab5912110 Fuzzy search tests. 2016-11-07 19:36:28 +05:30
Kishore Nallan
ee68da6f53 Build RocksDB and H2O also as part of the build process. 2016-10-21 09:18:13 +05:30
Kishore Nallan
c8eba7cf11 Adopt sequence ID as generated document ID, instead of using UUID. 2016-10-08 21:17:33 +05:30
Kishore Nallan
596430c036 Remove entry from rocksdb and art when required. 2016-10-05 21:24:40 +05:30
Kishore Nallan
ef105dcbd9 Reduce memory foot-print. 2016-10-04 21:31:55 +05:30
Kishore Nallan
d8eee0d04a Util for logging exec time. 2016-10-02 19:11:59 +05:30
Kishore Nallan
9d5a120dab Replace unordered_map with sparsepp hashmap. Much faster! 2016-09-27 22:03:41 +05:30
Kishore Nallan
080eceea79 Remove bit packing - use proper struct. 2016-09-27 20:53:38 +05:30
Kishore Nallan
5cd8b72d0b Fixed a bug in top-K sorting. 2016-09-25 13:10:34 +05:30
Kishore Nallan
e777afc97f API for removing a document from index. 2016-09-24 18:08:57 +05:30
Kishore Nallan
e7c6c6d3cb Fixed multi word queries. 2016-09-12 14:25:07 +05:30
Kishore Nallan
c96a9d9b35 Adopt Damerau–Levenshtein distance, instead of plain Levenshtein.
Specifically, we use the optimal string alignment distance. It treats transposition as a cost of 1, rather than 2.

https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Optimal_string_alignment_distance
2016-09-10 16:19:04 +05:30
Kishore Nallan
1a53d5692e Fuzzy match rewrite - still need to work on matching perf. 2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5 For now, disable prefix matches to be considered as whole matches. 2016-09-02 10:34:34 +05:30
Kishore Nallan
44da808f16 RocksDB based persistence. 2016-08-28 22:04:58 +05:30
Kishore Nallan
4d2ba27cab Release memory of value stored when art node is destroyed. 2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715 Fixed various issues flagged by Valgrind. 2016-08-27 13:44:53 +05:30
Kishore Nallan
1c09ec38a8 Removed redundant token_count field from the leaf. 2016-08-24 09:27:32 +05:30
Kishore Nallan
9b6547f050 Refactor index to be called as collection. 2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195 Add JSON dep. 2016-08-23 20:31:11 +05:30
Kishore Nallan
ba33da1d51 Lots of code clean up.
* Move stuff out of main to classes
* Standardize naming conventions.
2016-08-07 14:55:26 -07:00
Kishore Nallan
30cd057201 Split-up fuzzy lookup into separate stages.
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.

This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091 Calculation of hits for a token had a bug.
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
734640cd2a Fix size calculation for unsorted append. 2016-06-09 17:28:50 +05:30