421 Commits

Author SHA1 Message Date
Kishore Nallan
c25f7ccfdb Parameterize the num_typos from the query end-point. 2016-09-05 19:13:44 +05:30
Kishore Nallan
93de59be29 Added some conditions for search space reduction that puts performance back to original implementation. 2016-09-05 10:58:32 +05:30
Kishore Nallan
1a53d5692e Fuzzy match rewrite - still need to work on matching perf. 2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5 For now, disable prefix matches to be considered as whole matches. 2016-09-02 10:34:34 +05:30
Kishore Nallan
aa46985bab Handle spaces in query string. 2016-08-30 22:07:17 +05:30
Kishore Nallan
44da808f16 RocksDB based persistence. 2016-08-28 22:04:58 +05:30
Kishore Nallan
1d3af330dd JSON document as input to collection.add method. 2016-08-28 09:23:30 +05:30
Kishore Nallan
2804b145dd Add OS X build instructions. 2016-08-27 22:48:01 +05:30
Kishore Nallan
4d2ba27cab Release memory of value stored when art node is destroyed. 2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715 Fixed various issues flagged by Valgrind. 2016-08-27 13:44:53 +05:30
Kishore Nallan
1c36238f19 Fix debug flag. 2016-08-24 22:32:16 +05:30
Kishore Nallan
1e71058917 Length of char* was being calculated wrongly.
Need to consider the terminating null character.
2016-08-24 22:31:50 +05:30
Kishore Nallan
1c09ec38a8 Removed redundant token_count field from the leaf. 2016-08-24 09:27:32 +05:30
Kishore Nallan
2a77a1ad66 Removed redundant storage of length in offsets array. 2016-08-24 08:46:02 +05:30
Kishore Nallan
c079b22cbd Fix typo in test document harness.
Added better print debugging in the process.
2016-08-23 22:37:54 +05:30
Kishore Nallan
9b6547f050 Refactor index to be called as collection. 2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195 Add JSON dep. 2016-08-23 20:31:11 +05:30
Kishore Nallan
7147fa7ed5 Added design and todo docs. 2016-08-16 21:17:37 +05:30
Kishore Nallan
0eeb75b385 Boost dep is not needed. 2016-08-16 14:57:29 +05:30
Kishore Nallan
e6306ac432 Remove crow as dep. 2016-08-14 15:37:45 +05:30
Kishore Nallan
4f10586d13 Add skeleton HTTP server for serving the RESTish API. 2016-08-14 12:20:41 +05:30
Kishore Nallan
a228d153a6 Update README. 2016-08-14 12:19:35 +05:30
Kishore Nallan
a927a32018 Breaking down the long search method into smaller chunks. 2016-08-07 15:59:49 -07:00
Kishore Nallan
ba33da1d51 Lots of code clean up.
* Move stuff out of main to classes
* Standardize naming conventions.
2016-08-07 14:55:26 -07:00
Kishore Nallan
6c2974aaeb Add crow as a dep - http framework. 2016-08-07 14:54:26 -07:00
Kishore Nallan
e1f4b3d513 Constantize arguments, some clean-up code. 2016-08-05 18:26:31 -07:00
Kishore Nallan
45f0814a7a Fixed a bug in fuzzy search.
When the term_len is reached during traversal, max_cost was being compared with the wrong value.
2016-06-11 22:16:49 +05:30
Kishore Nallan
30cd057201 Split-up fuzzy lookup into separate stages.
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.

This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091 Calculation of hits for a token had a bug.
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
734640cd2a Fix size calculation for unsorted append. 2016-06-09 17:28:50 +05:30
Kishore Nallan
32cd67c9d1 The ART will store the frequency count in addition to the score.
In certain cases, the ability to identify the most similar tokens based on the popularity of the token is useful.
2016-06-08 22:18:52 +05:30
Kishore Nallan
bb0e7aefb9 Rename score to max_score for internal node and leaf structs. 2016-06-08 11:26:52 +05:30
Kishore Nallan
5591f564c8 Sort the results vector based on score finally.
Required when a multiple leaves of a given node are candidate token suggestions.
2016-06-08 10:23:02 +05:30
Kishore Nallan
04d02919b2 Fix memory corruption during unsorted append. 2016-06-04 19:05:47 +05:30
Kishore Nallan
c029e620d9 Clean up the match scoring logic.
Added more comments to illustrate what's happening.
2016-05-31 19:03:40 +05:30
Kishore Nallan
80d9f57b7b Code clean-up. 2016-05-30 20:13:55 +05:30
Kishore Nallan
0f756efe74 Fix sorting - should be in ascending order. 2016-05-30 20:13:44 +05:30
Kishore Nallan
beba88c1da Positional offsets are unsorted, so should be using unsorted append. 2016-05-30 20:12:04 +05:30
Kishore Nallan
3dde71e72e Unsorted append to forarray. 2016-05-30 20:11:26 +05:30
Kishore Nallan
383212be46 Fix bugs in top-K implementation. 2016-05-15 09:01:05 +05:30
Kishore Nallan
884a83f53c Use lower bound search to implement indexOf() 2016-05-15 09:00:42 +05:30
Kishore Nallan
10ff747802 Minor refactoring. Adding more comments. 2016-04-26 20:49:24 +05:30
Kishore Nallan
c667ed5d10 Fix static linking with libfor. 2016-04-25 21:51:02 +05:30
Kishore Nallan
f0f57f2e2d Saving state. 2016-03-23 07:38:43 +05:30
Kishore Nallan
566c4ce666 Intersection of documents across the search tokens. 2016-02-29 19:47:05 +05:30
Kishore Nallan
47df6201b1 Append offset related fields to the art leaf during insertion. 2016-02-28 21:13:54 +05:30
Kishore Nallan
1a7350c0ec Cartesian product of word suggestions for each query token to form search phrases. 2016-02-28 09:24:23 +05:30
Kishore Nallan
71a9c2709b Bug fix: Wrong order of arguments when recursing. 2016-02-28 09:01:58 +05:30
Kishore Nallan
0ba5c4874f Parameterized the number of fuzzy matches that are returned for words with typo. 2016-02-21 19:51:57 +05:30
Kishore Nallan
b88241d9e9 Bug fix: word suggestions were not showing up sorted on their document scores.
Somehow, std::max() on uint16_t does not seem to work. Using a MAX macro.
2016-02-21 19:21:20 +05:30