COSC430 discussion: Gorilla

--- tags: cosc430 2020 --- # COSC430 discussion: Gorilla ## Key take-away points about the time series lecture - ... # Discussion of the Gorilla paper ## Breakout room 1 ### Key problem under investigation? - How to balance between the efficiency, scalability in the design of TSDBs. - Handling of failures from the single node to the entire region. ### Key idea of the proposed solution? - It uses write-through cache in memory time series ### How does it solve the problem? - It used fault tolerance capability via many large scale simulated the failure and many disaster situation. - By using finely tuned compression algorithms for time stamps used. - Compression of the Time Stamps and values - By reducing the data consistency restriction it becomes higly available. ### Evaluation? - Reduction of query latency compared to the previous on disk Time Series. ### Drawbacks? - Gorilla host available for reads before older data is read off disk. - Prioritize recent data over the historical data. - To withstand single host failures and disaster events resource efficiency takes a hit. ## Breakout room 2 ### Key problem under investigation? - Storing measurements for monitoring purposes. - Required properties: - writes dominate, - state transitions, - high availability, - fault tolerance. ### Key idea of the proposed solution? - In-memory time series database that is used as a write-through cache of the most recent 26 hours of data. ### How does it solve the problem? - Specialised and fine-tuned compression algorithms for timestamps and values are used to achieve in-memory fitness. - Fault tolerance is achieved by saving data to disk and by replicating data across two datacenters in different geographical locations. - Scalability is achieved by sharding data across multiple servers. Sharding is implemented using time series map. - High availability is achieved by loose restrictions on data consistency, i.e., we do not guarantee ACID properties. ### Evaluation? - Query latency: compared to HBase, Gorilla has provided 73x - 350x improvement depending on query size. - Queries per second: the previous system served 450 qps, Gorilla currently handles more than 5000 qps, peaking at one point to 40000 qps. - High performance allowed to create new tools such as correlation engines, visualisation and aggregation tools. - Fault tolerance: Gorilla was successfully tested againts network cuts, disasters, node failures, restarts, and release bugs. ### Drawbacks? - Small amounts of data could be lost in case of failures. - Poor data model (e.g., only real values, no units), no database query language. ## Breakout room 3 ### Key problem under investigation? - Key problem is how to strike the right balance between efficiency, scalability, and relia- bility in TSDBs. - Writes dominate - State transitions - High availability - Fault Tolerance ### Key idea of the proposed solution? - leverage compression tech- niques such as delta-of-delta timestamps and XOR’d floating point values - On disk structures - New time series compression algorithm - High availability trumps resource efficiency - ### How does it solve the problem? - Compressing Time stamps - Compressing values ### Evaluation? - Gorilla has allowed us to reduce our production query latency by over 70x when compared to previous on-disk TSDB Success- fully doubled in size twice in this period without much operational effort demonstrating the scalability of TSDBs - ### Drawbacks? - Prioritize recent data over historical data. - Read latency - High availability trumps resource efficiency.