3102 Commits

Author SHA1 Message Date
Kishore Nallan
4e10fadeb7 Settle for partial matches when the whole query produces no results. 2016-11-26 17:13:16 +05:30
Kishore Nallan
396e10be5d Refactor collection's search method to be more judicious in using higher costs.
Earlier, even if one token produced no result, ALL tokens were searched with a higher cost. This change ensures that we first retry only the token that did not produce results with a larger cost before doing the same for other tokens.
2016-11-24 21:39:20 +05:30
Kishore Nallan
44d55cb13d Fixed a search issue: tokens that are not found in the index should be skipped. 2016-11-19 16:56:59 +05:30
Kishore Nallan
5736888935 Tests for collection. 2016-11-13 21:59:32 +05:30
Kishore Nallan
ea0da73cfb Fix C++ 11 warnings. 2016-11-13 09:56:13 +05:30
Kishore Nallan
18a4528540 Forarray tests. 2016-11-13 09:53:30 +05:30
Kishore Nallan
9bb24331cc Fuzzy search test - multiple results. 2016-11-12 21:30:22 +05:30
Kishore Nallan
aab5912110 Fuzzy search tests. 2016-11-07 19:36:28 +05:30
Kishore Nallan
c7e58efafd Add some regression tests for checking out of bounds. 2016-11-06 08:30:00 +05:30
Kishore Nallan
7a0187e6b3 Import and port art tests. 2016-11-01 18:19:21 +05:30
Kishore Nallan
c229b715c5 Build RocksDB as a shared library. 2016-10-22 20:42:02 +05:30
Kishore Nallan
ee68da6f53 Build RocksDB and H2O also as part of the build process. 2016-10-21 09:18:13 +05:30
Kishore Nallan
a789137d55 Build libfor automatically as part of the build process. 2016-10-19 14:46:56 +05:30
Kishore Nallan
da4c31065a Download and build libfor right from CMake. 2016-10-16 22:22:09 +05:30
Kishore Nallan
d2b903a931 Set-up google test. 2016-10-16 22:15:11 +05:30
Kishore Nallan
c8eba7cf11 Adopt sequence ID as generated document ID, instead of using UUID. 2016-10-08 21:17:33 +05:30
Kishore Nallan
596430c036 Remove entry from rocksdb and art when required. 2016-10-05 21:24:40 +05:30
Kishore Nallan
ef105dcbd9 Reduce memory foot-print. 2016-10-04 21:31:55 +05:30
Kishore Nallan
3e3e08aeca Log resident memory right after indexing. 2016-10-02 19:12:26 +05:30
Kishore Nallan
d8eee0d04a Util for logging exec time. 2016-10-02 19:11:59 +05:30
Kishore Nallan
9d5a120dab Replace unordered_map with sparsepp hashmap. Much faster! 2016-09-27 22:03:41 +05:30
Kishore Nallan
080eceea79 Remove bit packing - use proper struct. 2016-09-27 20:53:38 +05:30
Kishore Nallan
1cf5eb9d9c Fix path to source directory for make. 2016-09-26 08:18:00 +05:30
Kishore Nallan
5cd8b72d0b Fixed a bug in top-K sorting. 2016-09-25 13:10:34 +05:30
Kishore Nallan
e777afc97f API for removing a document from index. 2016-09-24 18:08:57 +05:30
Kishore Nallan
9f75b70b07 Add document end-point. 2016-09-13 21:35:21 +05:30
Kishore Nallan
59f25dca39 Fix libfor repository URL - updated CMakeLists & README. 2016-09-13 18:22:46 +05:30
Kishore Nallan
e7c6c6d3cb Fixed multi word queries. 2016-09-12 14:25:07 +05:30
Kishore Nallan
2f26b95c5b Intermediate matching nodes should not be pushed to the results vector. 2016-09-11 12:13:04 +05:30
Kishore Nallan
c96a9d9b35 Adopt Damerau–Levenshtein distance, instead of plain Levenshtein.
Specifically, we use the optimal string alignment distance. It treats transposition as a cost of 1, rather than 2.

https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Optimal_string_alignment_distance
2016-09-10 16:19:04 +05:30
Kishore Nallan
618a2020ed Delete the old fuzzy search implementation. 2016-09-06 17:43:25 +05:30
Kishore Nallan
c25f7ccfdb Parameterize the num_typos from the query end-point. 2016-09-05 19:13:44 +05:30
Kishore Nallan
93de59be29 Added some conditions for search space reduction that puts performance back to original implementation. 2016-09-05 10:58:32 +05:30
Kishore Nallan
1a53d5692e Fuzzy match rewrite - still need to work on matching perf. 2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5 For now, disable prefix matches to be considered as whole matches. 2016-09-02 10:34:34 +05:30
Kishore Nallan
aa46985bab Handle spaces in query string. 2016-08-30 22:07:17 +05:30
Kishore Nallan
44da808f16 RocksDB based persistence. 2016-08-28 22:04:58 +05:30
Kishore Nallan
1d3af330dd JSON document as input to collection.add method. 2016-08-28 09:23:30 +05:30
Kishore Nallan
2804b145dd Add OS X build instructions. 2016-08-27 22:48:01 +05:30
Kishore Nallan
4d2ba27cab Release memory of value stored when art node is destroyed. 2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715 Fixed various issues flagged by Valgrind. 2016-08-27 13:44:53 +05:30
Kishore Nallan
1c36238f19 Fix debug flag. 2016-08-24 22:32:16 +05:30
Kishore Nallan
1e71058917 Length of char* was being calculated wrongly.
Need to consider the terminating null character.
2016-08-24 22:31:50 +05:30
Kishore Nallan
1c09ec38a8 Removed redundant token_count field from the leaf. 2016-08-24 09:27:32 +05:30
Kishore Nallan
2a77a1ad66 Removed redundant storage of length in offsets array. 2016-08-24 08:46:02 +05:30
Kishore Nallan
c079b22cbd Fix typo in test document harness.
Added better print debugging in the process.
2016-08-23 22:37:54 +05:30
Kishore Nallan
9b6547f050 Refactor index to be called as collection. 2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195 Add JSON dep. 2016-08-23 20:31:11 +05:30
Kishore Nallan
7147fa7ed5 Added design and todo docs. 2016-08-16 21:17:37 +05:30
Kishore Nallan
0eeb75b385 Boost dep is not needed. 2016-08-16 14:57:29 +05:30