# DSA
## Concepts
### Monotonic Stack:
https://medium.com/@manuchaitanya/-stack-for-efficient-problem-solving-next-greater-next-smaller-previous-greater-and-6c63d0572644
Questions:
1. [Largest Rectangle](https://www.hackerrank.com/challenges/largest-rectangle/problem)
2. [Poisonous Plants](https://www.hackerrank.com/challenges/poisonous-plants/problem)
### Topological Sort
Questions:
1. a
2. b
3. c
## Advanced Data Structures
### [Disjoint Set Union (DSU)](https://www.hackerearth.com/practice/notes/disjoint-set-union-union-find/)
Questions:
1. [Components in a graph](https://www.hackerrank.com/challenges/components-in-graph/problem?isFullScreen=false)
2. a
3. b
### Binary Index Tree (BIT)
Questions:
1. a
2. b
3. c
### [Binary Heap /Priority Queue](https://www.digitalocean.com/community/tutorials/min-heap-binary-tree)
Questions:
1. [Implementation](https://www.hackerrank.com/challenges/qheap1/problem)
2. [Jesse and Cookies](https://www.hackerrank.com/challenges/jesse-and-cookies/problem?isFullScreen=true)
3. c
https://cses.fi/problemset/
# Kakfa vs RMQ
1. RMQ has exchange which provides complex routing b/w producer and consumer whereas kafka does not. It also supports routing a message to multiple queues.
2. RMQ supports Priority Queues, kafka does not.
3. One Message in Queue will be consumed by single consumer in RMQ but Kafka supports Consumer Groups. In RMQ it will be pushed to multiple queues to support multiple ACK by consumer.
4. RMQ deletes the message immediately after acked, but Kafka stores it till its retention period. Kafka supports replays.
5. RMQ is pushed based, kafka is pull based
6. "Kafka Streams" is a Java library for building stream processing applications that can perform complex operations like filtering, transformations, aggregations, joins, and windowing on data streams.
7. Kafka Consumers could not scale well with Lambda Functions where very high concurrency is required due to the bottleneck of no of partitions.Topic should not have thousands of partitons.
## Why is Kafka Fast AF ?
1. Zero Copy: The zero-copy principle is a critical performance optimization technique that minimizes the number of times data is copied in memory as it moves from the disk to the network interface.Data is read from the disk into the OS page cache (a kernel buffer) using Direct Memory Access (DMA).The data is copied directly from the page cache to the Network Interface Card (NIC) buffer, also via DMA, with minimal CPU involvement.Direct memory access (DMA): This technique allows a device, such as a network card or disk controller, to transfer data directly to or from memory, without the need for the CPU to be involved in the transfer.DMA allows the consumer to read the data directly from the log file into the consumer’s memory buffer, without the need for an intermediate copy.
2. Sequential IO and batching.
3. Distributed architecture: Data is partitioned and distributed across multiple servers (brokers). This allows for horizontal scaling, meaning you can add more brokers to handle more load.
4. Consumers are "cheap" in Kafka because they don't mutate the log files. Many consumers can read from the same topic concurrently without overwhelming the cluster, as they only perform sequential reads and maintain their own progress pointers (offsets).
5. Rabbit MQ inherently support Dead Letter Queue.
## Other Kafka Facts
1. ZooKeeper is also used by Kafka brokers for leader elections, in which a broker is chosen to lead the handling of client requests for a certain partition of a topic. Connecting to any broker will bring a client up to speed with the entire Kafka cluster. A minimum of three brokers should be used to achieve reliable failover; the higher the number of brokers, the more reliable the failover.
2. In-Sync Replica (ISR) is a replica that is up to date with the partition's leader.
3. A Schema Registry is present for both producers and consumers in a Kafka cluster, and it holds Avro schemas. For easy serialization and de-serialization, Avro schemas enable the configuration of compatibility parameters between producers and consumers. The Kafka Schema Registry is used to ensure that the schema used by the consumer and the schema used by the producer are identical. The producers just need to submit the schema ID and not the whole schema when using the Confluent schema registry in Kafka. The consumer looks up the matching schema in the Schema Registry using the schema ID.
4. Kafka does not allow you to reduce the number of partitions for a topic
5. Kafka Publish message supports ACK(0,1,all) modes.
6. One of the brokers in kafka is elected as an Active controller (using zookeeper or kraft) which decides each partition's Leader Broker.
7. Each consumer group has its own coordinator. There is one dedicated broker that acts as the "Group Coordinator" for a specific consumer group. A single Kafka cluster will have multiple group coordinators running simultaneously, each managing a different consumer group.
i. Managing Membership:Tracking which consumers are part of the group via a heartbeat mechanism. If a consumer fails to send heartbeats, the coordinator marks it as inactive.
ii: Triggering Rebalances: Initiating a rebalance when consumers join or leave the group, or fail. This process reassigns partitions to the active consumers in the group to ensure an even workload distribution.
iii:Tracking Offsets: Receiving and persisting committed offsets from consumers to the __consumer_offsets topic, which allows consumers to resume consumption from where they left off after a crash or restart.
8. The leader of each partition tracks the ISR List by computing the lag of each replica from itself.
# Zookeeeper
Add more details here
It Provides Key Value Store and is used for distributed config store.
### Zookeeper Nodes:
1. Persistence Znode: These are znodes that continue to function even after the client who created them has been disconnected. Unless otherwise specified, all znodes are persistent by default.
2. Ephemeral Znode: Ephemeral znodes are only active while the client is still alive. When the client who produced them disconnects from the ZooKeeper ensemble, the ephemeral Znodes are automatically removed. They have a significant part in the election of the leader.
## Elastic Search vs Clickhouse
1. Clickhouse is a SQL columnar database optimized for high-speed analytical queries OLAP. ElasticSearch is a NOSQL search engine for full-text search and real-time analytics on document data.
2. CH is better for aggregation while ES for Full Text Search.
3. ES uses Inverted Indexes for searching and CH workds on Columnar search.
4. CH supports compression.
5. ES used more disk storage.
6. ES supports Fuzzy Search using Edit Distance formulaes.
7. ES also supports ranking algo, like BM25.