tags: WebHack#44

WebHack#44 Quickwit: A Search Engine for Logs##

Speaker: Paul, the CEO of Quickwit, Inc.
Slides: TBD

Talk

Agenda

Every single logging related product.

Two types of search engine

  • number of docs
  • Query/Secs

How is private search different?

  • Lower (QPS/number of docs) ratio
  • Typically multiple indices, if not multitenant

Let's build 10X search engine

  • tantivy / rust is ~2x faster in search and indexing thant Lucene
  • Rust gives better variance thant Java (no GC)

Document rreplication is wasteful

  • Let's index once and replicate the segments!
  • Copy index only
  • index is a throughtput game = high load factor is a good thing
  • search is a latency game = high load factor is a bad thing

The promises of multitenancy

  • Share nothing archtecture
    • Neither memory nor storage is shared amongs processors
    • Each node is in charge of subset of
  • Shared nothing says
    • Move the query, not the data
    • Move the small thing, not the big one

Butnetwork is faster nowadays

Multitenancy

Cost with a shared nothing architecture

  • Expensive
  • What if we could use a shared disk architecture?
    • Cost of storage is cheaper
  • The challenge: At first sight, an object storage is like a super slow spinning disk
  • 3 Problems
    • true multitenancy would mean opening the index upon each query
    • a lacking throughput
    • a 70ms latency

We plan IO ahead, and run it in parallel and asynchronously

  • Adding and removing sever in seconds
  • Sleep tight. Replication is delegated to an object storage

Demo Time:

Q & A

Q: The trade off of indexing and storage, and how do the system handle the incremental index update?
Ans:

  • Little's Law - In Startbucks

Q: If I understand it correctly, you are storing index segments as S3 objects instead of files.
Is there any object storage feature (e.g. schema) do you use? or you abstraction is just key to byte[]?
Ans:

  • Just key to byte[]

Q:
Ans:

If you want to contact with the speaker:

Networking

Select a repo