29 Commits

Author SHA1 Message Date
Harpreet Sangar
0855aeff24
Union (#2051)
Some checks failed
tests / test (push) Has been cancelled
* Add `collection_search_args`.

* Remove `collection_name` argument. Use `Index::get_collection_name()` instead.

* Add `collection_search_args_t`, `Union_KV` and `Union_Topster`.

* Allow multiple sub-searches to the same collection in union.

* Fix union tests.

* Union pagination.

* Union sorting.

* Update test.

* Totals documents count of a particular collection should only be added once in `out_of`.

* Update error message.

* Make `Topster` generic.

* Refactor template declaration.

* Fix `CollectionFacetingTest, FacetCountsBool`.

* Rename parameter.

* Rename return parameters.

* Add `search_time_ms` and `page` return parameters.
2024-12-16 21:31:49 +05:30
Kishore Nallan
b9d70433b4 Revert "Do grouping in two pass (#1677)"
This reverts commit dccf6eb1864870cffc5ca71e3307e59b6ee5d9b2.

# Conflicts:
#	src/index.cpp
2024-05-16 17:37:54 +05:30
Krunal Gandhi
dccf6eb186
Do grouping in two pass (#1677)
* do grouping in two pass

* add hyperloglog count, use distinct_key in kv_map

* counter for groups found in current pass, update the tests

* increase hyperloglog threshold, refactor topster test
2024-04-24 22:25:58 +05:30
Kishore Nallan
cd5cfc5445 Merge branch 'v0.24-nested' into v0.25
# Conflicts:
#	include/collection.h
#	src/collection.cpp
#	src/collection_manager.cpp
#	src/index.cpp
2023-01-09 16:06:34 +05:30
Kishore Nallan
bc31be874a Add text match modes: max_score and max_weight. 2023-01-04 20:30:30 +05:30
Kishore Nallan
c6ea968f01 Merge branch 'v0.25' into bazel-build
# Conflicts:
#	.gitignore
2022-12-15 21:19:31 +05:30
0x2Adr1
bbebb1a567
Bazel (#736) 2022-12-15 21:09:06 +05:30
Kishore Nallan
c7f879bf30 Return vector distance in response. 2022-09-15 11:34:27 +05:30
Kishore Nallan
21c31de3b8 Ensure that topster is fully stable on equal values. 2022-04-15 11:03:46 +05:30
kishorenc
bc1d88f1eb Consider tokens matching across fields during ranking. 2020-12-28 19:20:00 +05:30
kishorenc
7e2b0fcdcb Match query tokens across multiple fields effectively. 2020-12-28 19:19:59 +05:30
kishorenc
b97c37215a Basic distinct test is passing. 2020-06-06 15:14:53 +05:30
kishorenc
8f458640fd Choose sift down/up based on array size. 2020-06-06 13:20:42 +05:30
kishorenc
5b2407433f Refactor topster to support grouping. 2020-06-05 20:41:51 +05:30
kishorenc
5e1c5f2093 Ditch use of number_t for sorting. 2020-03-05 08:03:01 +05:30
kishorenc
eed10d554d Sort results on custom order. 2020-03-04 20:27:33 +05:30
kishorenc
7b342c7c73 Refactor number_t to use a single int64_t as store. 2020-03-04 06:23:50 +05:30
kishorenc
fd285b6fbe Allow maximum hits returned to be configurable.
This obviously has a performance impact, but it might not be a big deal for most people and is now left to their discretion. The default of 500 results stays to maintain backward compatibility.
2020-02-10 20:54:38 +05:30
Kishore Nallan
56ed39e3ff Refactor topster to ensure that it handles insertion of duplicate keys.
Instead of ignoring a duplicate blindly, ignore when match score is less than existing key.
2018-04-30 20:28:56 +05:30
Kishore Nallan
71f1fbb4aa Refactor query index logic. 2018-04-25 19:32:46 +05:30
Kishore Nallan
1d1cd2459b When multiple fields are searched, the same document should not be returned twice. 2018-04-24 17:49:04 +05:30
Kishore Nallan
c3298ba6d8 Address -Wall and -Wextra warnings. 2018-01-25 20:08:13 +05:30
Kishore Nallan
d351523655 Allow results to be sorted on a float field. 2017-08-20 21:15:48 +05:30
Kishore Nallan
3104dea42a Generify the topster container to hold both integer and float.
Benchmarked to ensure that performance is on par.
2017-08-20 15:25:11 +05:30
Kishore Nallan
6a6785ef74 Short circuit to speed up single token searches.
- Refactor token position population
- Store only the query index in topster instead of storing the full offsets.
- Calculate the offsets finally on the results that are to be returned.
2017-08-08 17:39:23 -04:00
Kishore Nallan
57e03efe1f Contexual snippet only for longer strings.
Strings under a defined constant token length will be fully highlighted, instead of showing a snippet of relevant matching portion.
2017-06-14 08:53:23 +02:00
Kishore Nallan
1d5146f7ff Track best-matched token offsets needed for highlighting.
- We store the best matched token offset positions in Topster KV
- Using run-length encoding (via unions) to pack the offset diffs intelligently
2017-06-09 13:32:03 -05:00
Kishore Nallan
7b8452b7bf Fix failing test. 2017-04-02 07:25:02 +05:30
Kishore Nallan
473aa6d5f6 Basic test for topster. 2016-12-28 21:28:27 +05:30