Kishore Nallan
c25f7ccfdb
Parameterize the num_typos from the query end-point.
2016-09-05 19:13:44 +05:30
Kishore Nallan
93de59be29
Added some conditions for search space reduction that puts performance back to original implementation.
2016-09-05 10:58:32 +05:30
Kishore Nallan
1a53d5692e
Fuzzy match rewrite - still need to work on matching perf.
2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5
For now, disable prefix matches to be considered as whole matches.
2016-09-02 10:34:34 +05:30
Kishore Nallan
aa46985bab
Handle spaces in query string.
2016-08-30 22:07:17 +05:30
Kishore Nallan
44da808f16
RocksDB based persistence.
2016-08-28 22:04:58 +05:30
Kishore Nallan
1d3af330dd
JSON document as input to collection.add
method.
2016-08-28 09:23:30 +05:30
Kishore Nallan
2804b145dd
Add OS X build instructions.
2016-08-27 22:48:01 +05:30
Kishore Nallan
4d2ba27cab
Release memory of value stored when art node is destroyed.
2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715
Fixed various issues flagged by Valgrind.
2016-08-27 13:44:53 +05:30
Kishore Nallan
1c36238f19
Fix debug flag.
2016-08-24 22:32:16 +05:30
Kishore Nallan
1e71058917
Length of char* was being calculated wrongly.
...
Need to consider the terminating null character.
2016-08-24 22:31:50 +05:30
Kishore Nallan
1c09ec38a8
Removed redundant token_count field from the leaf.
2016-08-24 09:27:32 +05:30
Kishore Nallan
2a77a1ad66
Removed redundant storage of length in offsets array.
2016-08-24 08:46:02 +05:30
Kishore Nallan
c079b22cbd
Fix typo in test document harness.
...
Added better print debugging in the process.
2016-08-23 22:37:54 +05:30
Kishore Nallan
9b6547f050
Refactor index
to be called as collection
.
2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195
Add JSON dep.
2016-08-23 20:31:11 +05:30
Kishore Nallan
7147fa7ed5
Added design and todo docs.
2016-08-16 21:17:37 +05:30
Kishore Nallan
0eeb75b385
Boost dep is not needed.
2016-08-16 14:57:29 +05:30
Kishore Nallan
e6306ac432
Remove crow as dep.
2016-08-14 15:37:45 +05:30
Kishore Nallan
4f10586d13
Add skeleton HTTP server for serving the RESTish API.
2016-08-14 12:20:41 +05:30
Kishore Nallan
a228d153a6
Update README.
2016-08-14 12:19:35 +05:30
Kishore Nallan
a927a32018
Breaking down the long search method into smaller chunks.
2016-08-07 15:59:49 -07:00
Kishore Nallan
ba33da1d51
Lots of code clean up.
...
* Move stuff out of main to classes
* Standardize naming conventions.
2016-08-07 14:55:26 -07:00
Kishore Nallan
6c2974aaeb
Add crow as a dep - http framework.
2016-08-07 14:54:26 -07:00
Kishore Nallan
e1f4b3d513
Constantize arguments, some clean-up code.
2016-08-05 18:26:31 -07:00
Kishore Nallan
45f0814a7a
Fixed a bug in fuzzy search.
...
When the term_len is reached during traversal, max_cost was being compared with the wrong value.
2016-06-11 22:16:49 +05:30
Kishore Nallan
30cd057201
Split-up fuzzy lookup into separate stages.
...
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.
This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091
Calculation of hits for a token had a bug.
...
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
734640cd2a
Fix size calculation for unsorted append.
2016-06-09 17:28:50 +05:30
Kishore Nallan
32cd67c9d1
The ART will store the frequency count in addition to the score.
...
In certain cases, the ability to identify the most similar tokens based on the popularity of the token is useful.
2016-06-08 22:18:52 +05:30
Kishore Nallan
bb0e7aefb9
Rename score
to max_score
for internal node and leaf structs.
2016-06-08 11:26:52 +05:30
Kishore Nallan
5591f564c8
Sort the results vector based on score finally.
...
Required when a multiple leaves of a given node are candidate token suggestions.
2016-06-08 10:23:02 +05:30
Kishore Nallan
04d02919b2
Fix memory corruption during unsorted append.
2016-06-04 19:05:47 +05:30
Kishore Nallan
c029e620d9
Clean up the match scoring logic.
...
Added more comments to illustrate what's happening.
2016-05-31 19:03:40 +05:30
Kishore Nallan
80d9f57b7b
Code clean-up.
2016-05-30 20:13:55 +05:30
Kishore Nallan
0f756efe74
Fix sorting - should be in ascending order.
2016-05-30 20:13:44 +05:30
Kishore Nallan
beba88c1da
Positional offsets are unsorted, so should be using unsorted append.
2016-05-30 20:12:04 +05:30
Kishore Nallan
3dde71e72e
Unsorted append to forarray.
2016-05-30 20:11:26 +05:30
Kishore Nallan
383212be46
Fix bugs in top-K implementation.
2016-05-15 09:01:05 +05:30
Kishore Nallan
884a83f53c
Use lower bound search to implement indexOf()
2016-05-15 09:00:42 +05:30
Kishore Nallan
10ff747802
Minor refactoring. Adding more comments.
2016-04-26 20:49:24 +05:30
Kishore Nallan
c667ed5d10
Fix static linking with libfor.
2016-04-25 21:51:02 +05:30
Kishore Nallan
f0f57f2e2d
Saving state.
2016-03-23 07:38:43 +05:30
Kishore Nallan
566c4ce666
Intersection of documents across the search tokens.
2016-02-29 19:47:05 +05:30
Kishore Nallan
47df6201b1
Append offset related fields to the art leaf during insertion.
2016-02-28 21:13:54 +05:30
Kishore Nallan
1a7350c0ec
Cartesian product of word suggestions for each query token to form search phrases.
2016-02-28 09:24:23 +05:30
Kishore Nallan
71a9c2709b
Bug fix: Wrong order of arguments when recursing.
2016-02-28 09:01:58 +05:30
Kishore Nallan
0ba5c4874f
Parameterized the number of fuzzy matches that are returned for words with typo.
2016-02-21 19:51:57 +05:30
Kishore Nallan
b88241d9e9
Bug fix: word suggestions were not showing up sorted on their document scores.
...
Somehow, std::max() on uint16_t does not seem to work. Using a MAX macro.
2016-02-21 19:21:20 +05:30