Kishore Nallan
da4c31065a
Download and build libfor right from CMake.
2016-10-16 22:22:09 +05:30
Kishore Nallan
d2b903a931
Set-up google test.
2016-10-16 22:15:11 +05:30
Kishore Nallan
c8eba7cf11
Adopt sequence ID as generated document ID, instead of using UUID.
2016-10-08 21:17:33 +05:30
Kishore Nallan
596430c036
Remove entry from rocksdb and art when required.
2016-10-05 21:24:40 +05:30
Kishore Nallan
ef105dcbd9
Reduce memory foot-print.
2016-10-04 21:31:55 +05:30
Kishore Nallan
3e3e08aeca
Log resident memory right after indexing.
2016-10-02 19:12:26 +05:30
Kishore Nallan
d8eee0d04a
Util for logging exec time.
2016-10-02 19:11:59 +05:30
Kishore Nallan
9d5a120dab
Replace unordered_map with sparsepp hashmap. Much faster!
2016-09-27 22:03:41 +05:30
Kishore Nallan
080eceea79
Remove bit packing - use proper struct.
2016-09-27 20:53:38 +05:30
Kishore Nallan
1cf5eb9d9c
Fix path to source directory for make.
2016-09-26 08:18:00 +05:30
Kishore Nallan
5cd8b72d0b
Fixed a bug in top-K sorting.
2016-09-25 13:10:34 +05:30
Kishore Nallan
e777afc97f
API for removing a document from index.
2016-09-24 18:08:57 +05:30
Kishore Nallan
9f75b70b07
Add document end-point.
2016-09-13 21:35:21 +05:30
Kishore Nallan
59f25dca39
Fix libfor repository URL - updated CMakeLists & README.
2016-09-13 18:22:46 +05:30
Kishore Nallan
e7c6c6d3cb
Fixed multi word queries.
2016-09-12 14:25:07 +05:30
Kishore Nallan
2f26b95c5b
Intermediate matching nodes should not be pushed to the results vector.
2016-09-11 12:13:04 +05:30
Kishore Nallan
c96a9d9b35
Adopt Damerau–Levenshtein distance, instead of plain Levenshtein.
...
Specifically, we use the optimal string alignment distance. It treats transposition as a cost of 1, rather than 2.
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance#Optimal_string_alignment_distance
2016-09-10 16:19:04 +05:30
Kishore Nallan
618a2020ed
Delete the old fuzzy search implementation.
2016-09-06 17:43:25 +05:30
Kishore Nallan
c25f7ccfdb
Parameterize the num_typos from the query end-point.
2016-09-05 19:13:44 +05:30
Kishore Nallan
93de59be29
Added some conditions for search space reduction that puts performance back to original implementation.
2016-09-05 10:58:32 +05:30
Kishore Nallan
1a53d5692e
Fuzzy match rewrite - still need to work on matching perf.
2016-09-04 12:22:16 +05:30
Kishore Nallan
334ce264a5
For now, disable prefix matches to be considered as whole matches.
2016-09-02 10:34:34 +05:30
Kishore Nallan
aa46985bab
Handle spaces in query string.
2016-08-30 22:07:17 +05:30
Kishore Nallan
44da808f16
RocksDB based persistence.
2016-08-28 22:04:58 +05:30
Kishore Nallan
1d3af330dd
JSON document as input to collection.add
method.
2016-08-28 09:23:30 +05:30
Kishore Nallan
2804b145dd
Add OS X build instructions.
2016-08-27 22:48:01 +05:30
Kishore Nallan
4d2ba27cab
Release memory of value stored when art node is destroyed.
2016-08-27 19:49:52 +05:30
Kishore Nallan
94db15b715
Fixed various issues flagged by Valgrind.
2016-08-27 13:44:53 +05:30
Kishore Nallan
1c36238f19
Fix debug flag.
2016-08-24 22:32:16 +05:30
Kishore Nallan
1e71058917
Length of char* was being calculated wrongly.
...
Need to consider the terminating null character.
2016-08-24 22:31:50 +05:30
Kishore Nallan
1c09ec38a8
Removed redundant token_count field from the leaf.
2016-08-24 09:27:32 +05:30
Kishore Nallan
2a77a1ad66
Removed redundant storage of length in offsets array.
2016-08-24 08:46:02 +05:30
Kishore Nallan
c079b22cbd
Fix typo in test document harness.
...
Added better print debugging in the process.
2016-08-23 22:37:54 +05:30
Kishore Nallan
9b6547f050
Refactor index
to be called as collection
.
2016-08-23 20:32:37 +05:30
Kishore Nallan
ae34ae3195
Add JSON dep.
2016-08-23 20:31:11 +05:30
Kishore Nallan
7147fa7ed5
Added design and todo docs.
2016-08-16 21:17:37 +05:30
Kishore Nallan
0eeb75b385
Boost dep is not needed.
2016-08-16 14:57:29 +05:30
Kishore Nallan
e6306ac432
Remove crow as dep.
2016-08-14 15:37:45 +05:30
Kishore Nallan
4f10586d13
Add skeleton HTTP server for serving the RESTish API.
2016-08-14 12:20:41 +05:30
Kishore Nallan
a228d153a6
Update README.
2016-08-14 12:19:35 +05:30
Kishore Nallan
a927a32018
Breaking down the long search method into smaller chunks.
2016-08-07 15:59:49 -07:00
Kishore Nallan
ba33da1d51
Lots of code clean up.
...
* Move stuff out of main to classes
* Standardize naming conventions.
2016-08-07 14:55:26 -07:00
Kishore Nallan
6c2974aaeb
Add crow as a dep - http framework.
2016-08-07 14:54:26 -07:00
Kishore Nallan
e1f4b3d513
Constantize arguments, some clean-up code.
2016-08-05 18:26:31 -07:00
Kishore Nallan
45f0814a7a
Fixed a bug in fuzzy search.
...
When the term_len is reached during traversal, max_cost was being compared with the wrong value.
2016-06-11 22:16:49 +05:30
Kishore Nallan
30cd057201
Split-up fuzzy lookup into separate stages.
...
1. Collect all the nodes where cost exceeds threshold.
2. Sort these nodes based on a score.
3. Perform top-k iteration to locate high scoring leaves.
This ensures that small scoring leaves don't end up trumping leaves with higher score (as it was noticed).
2016-06-10 23:20:44 +05:30
Kishore Nallan
4face51091
Calculation of hits for a token had a bug.
...
Should use search rather than prefix lookup for finding the hits so far for the exact token.
2016-06-09 17:31:32 +05:30
Kishore Nallan
734640cd2a
Fix size calculation for unsorted append.
2016-06-09 17:28:50 +05:30
Kishore Nallan
32cd67c9d1
The ART will store the frequency count in addition to the score.
...
In certain cases, the ability to identify the most similar tokens based on the popularity of the token is useful.
2016-06-08 22:18:52 +05:30
Kishore Nallan
bb0e7aefb9
Rename score
to max_score
for internal node and leaf structs.
2016-06-08 11:26:52 +05:30