mirror of
https://github.com/typesense/typesense.git
synced 2025-05-17 04:02:36 +08:00
Added design and todo docs.
This commit is contained in:
parent
0eeb75b385
commit
7147fa7ed5
46
DESIGN.md
Normal file
46
DESIGN.md
Normal file
@ -0,0 +1,46 @@
|
||||
# Typesense: Design
|
||||
|
||||
## Motivation
|
||||
|
||||
Typesense's design is motivated by the following considerations:
|
||||
|
||||
- **Simplicity:** Typesense has to be super simple to set-up and get started with. The default configuration
|
||||
*should just work* for the common search use cases.
|
||||
- **Typo-tolerance out-of-the-box:** Currently, it's not at all easy to build a typo-tolerant search using existing
|
||||
systems without a considerable speed/memory penalty. This has to change, given how common typographic errors are
|
||||
in the real-world.
|
||||
- **In-memory:** All primary data structures would be in-memory, with the disk being used only for durability and for
|
||||
fetching large, unindexed fields.
|
||||
- **Optimize for reads over writes:** A typical search engine is written once and read many times. The system should be
|
||||
cognizant of this read/write pattern.
|
||||
- **Fast, without sacrificing relevance:** While speed is important, one cannot compromise on the quality of results
|
||||
returned. Remember that the average reaction time for humans is 200ms to a visual stimulus.
|
||||
- **Laser focused on search:** While there might be some overlap with what a relational database does, strive to focus
|
||||
primarily on search, instead of becoming a generalized data store with a complex query language.
|
||||
- **Availability over consistency**: In the event of a partition failure, it's better to give slightly stale search
|
||||
results, than being unavailable. This is alright, given the inherent asynchronous nature of the indexing process.
|
||||
|
||||
## Overview
|
||||
|
||||
- At the heart of Typesense is a `token => documents` inverted index backed by an
|
||||
[Adapative Radix Tree](https://db.in.tum.de/~leis/papers/ART.pdf), which is a memory-efficient implementation of the
|
||||
Trie data structure. ART allows us to do fast fuzzy searches on a query.
|
||||
- Typesense consumes JSON documents as input. Fields that should be indexed must be specified via a configuration file
|
||||
or through the API.
|
||||
- The raw JSON documents are stored on disk using RocksDB. SSD disks are highly recommended.
|
||||
- Search results are ranked on the following factors:
|
||||
- Number of matching tokens
|
||||
- Proximity of search tokens within the documents that contain these tokens
|
||||
- User specified static score for a document (for e.g. the number of followers could a static score for a
|
||||
Twitter user)
|
||||
- A typical search query involves:
|
||||
- a search term (required - wild card `*` search is not allowed)
|
||||
- filter fields (optional)
|
||||
- facet fields (optional)
|
||||
- sort fields (optional)
|
||||
- page
|
||||
- limit
|
||||
- Typesense is exposed through a RESTful API, so that it can be consumed directly by web apps via AJAX requests.
|
||||
- High Availability is achieved using Master-Master replication. Every write to Typesense would be written and
|
||||
acknowledged by another node before the write is deemed as a success. Clients can round-robin both reads and
|
||||
writes across both the nodes.
|
@ -1,6 +1,6 @@
|
||||
# Typesense
|
||||
|
||||
Typesense is an open source search engine for building delightful search experiences.
|
||||
Typesense is an open source search engine for building a delightful search experience.
|
||||
|
||||
- **Typo tolerance:** Handles typographical errors out-of-the-box
|
||||
- **Tunable ranking + relevancy:** Tailor your search results to perfection
|
||||
@ -16,7 +16,6 @@ TODO
|
||||
* [libfor](https://github.com/cruppstahl/for/)
|
||||
* [h2o](https://github.com/h2o/h2o)
|
||||
* OpenSSL
|
||||
* Boost
|
||||
|
||||
## Building `libfor`
|
||||
|
||||
|
27
TODO.md
Normal file
27
TODO.md
Normal file
@ -0,0 +1,27 @@
|
||||
# Typesense: TODO
|
||||
|
||||
## Pre-alpha
|
||||
|
||||
**Search index**
|
||||
|
||||
- Proper JSON as input
|
||||
- Storing raw JSON input to RocksDB
|
||||
- ART for every indexed field
|
||||
- UTF-8 support for fuzzy search
|
||||
- Facets
|
||||
- Filters
|
||||
- Support search operators like +, - etc.
|
||||
|
||||
**API**
|
||||
|
||||
- Support the following operations:
|
||||
- create a new index
|
||||
- index a single document
|
||||
- bulk insert multiple documents
|
||||
- fetch a document by ID
|
||||
- delete a document by ID
|
||||
- query an index
|
||||
|
||||
**Clustering**
|
||||
|
||||
- Sync every incoming write with another Typesense server
|
Loading…
x
Reference in New Issue
Block a user