# π **Kafka Tutorial β Part 4: Kafka Topics, Partitions & Replication Deep Dive β Mastering Scalability, Durability & Data Management**
**#ApacheKafka #KafkaTopics #Partitioning #Replication #DataRetention #LogCompaction #KafkaAdmin #KafkaTutorial #DataEngineering #StreamingArchitecture**
---
## πΉ **Table of Contents**
1. [Recap of Part 3](#recap-of-part-3)
2. [What Are Kafka Topics? A Deep Revisit](#what-are-kafka-topics-a-deep-revisit)
3. [Understanding Partitions: The Engine of Parallelism](#understanding-partitions-the-engine-of-parallelism)
4. [How to Choose the Right Number of Partitions](#how-to-choose-the-right-number-of-partitions)
5. [Replication: Ensuring Fault Tolerance & High Availability](#replication-ensuring-fault-tolerance--high-availability)
6. [In-Sync Replicas (ISR) & Leader Election](#in-sync-replicas-isr--leader-election)
7. [Critical Topic-Level Configurations](#critical-topic-level-configurations)
8. [Data Retention Policies: Time vs Size](#data-retention-policies-time-vs-size)
9. [Log Compaction vs Log Deletion](#log-compaction-vs-log-deletion)
10. [Managing Topics with Kafka CLI & Admin API](#managing-topics-with-kafka-cli--admin-api)
11. [Reassigning Partitions & Expanding Topics](#reassigning-partitions--expanding-topics)
12. [Monitoring Topics & Brokers](#monitoring-topics--brokers)
13. [Common Pitfalls & Best Practices](#common-pitfalls--best-practices)
14. [Visualizing Topic Architecture (Diagram)](#visualizing-topic-architecture-diagram)
15. [Summary & Whatβs Next in Part 5](#summary--whats-next-in-part-5)
---
## π **1. Recap of Part 3**
In **Part 3**, we mastered **Kafka Consumers**:
- Built robust consumers in **Java and Python**.
- Understood **consumer groups**, **rebalancing**, and **partition assignment**.
- Learned about **offsets**, **auto vs manual commits**, and **at-least-once delivery**.
- Handled **failures**, **duplicates**, and **consumer lag**.
- Explored performance tuning and monitoring.
Now, in **Part 4**, we go **deep into the storage layer** of Kafka: **Topics, Partitions, and Replication**.
Youβll learn how Kafka ensures **durability**, **scalability**, and **fault tolerance** β and how to **configure topics like a pro**.
Letβs dive in!
---
## π **2. What Are Kafka Topics? A Deep Revisit**
A **topic** is more than just a "category" for messages β it's a **distributed, replicated, immutable commit log**.
> πΉ Think of a topic as a **"table"** in a distributed database, but optimized for **append-only, high-throughput writes**.
Each topic has:
- A **name** (e.g., `user-events`)
- **Partitions** (e.g., 6)
- **Replication factor** (e.g., 3)
- **Configuration** (retention, cleanup, etc.)
### π§ Topic Anatomy:
```
Topic: user-events
βββ Partition 0
β βββ Leader: Broker 0
β βββ Replicas: [0, 1, 2]
β βββ ISR: [0, 1]
β βββ Data: [msg0, msg1, msg2, ...] (stored on disk)
βββ Partition 1
β βββ Leader: Broker 1
β βββ Replicas: [1, 2, 0]
β βββ ISR: [1, 2]
β βββ Data: [msg0, msg1, ...]
βββ ...
```
> β
Topics are **the foundation** of Kafkaβs scalability and durability.
---
## π **3. Understanding Partitions: The Engine of Parallelism**
### πΉ What is a Partition?
A **partition** is an **ordered, immutable sequence of messages** that is continually appended to.
> β
Each partition is a **file on disk**, stored in the brokerβs `log.dirs`.
Key properties:
- Messages in a partition are **ordered**.
- Each message has a unique **offset** (0, 1, 2, ...).
- Partitions are **independently replicated**.
### πΉ Why Partitions Matter
| Benefit | Description |
|--------|-------------|
| **Scalability** | More partitions = more parallelism (producers & consumers) |
| **Throughput** | Multiple producers/consumers can write/read simultaneously |
| **Load Distribution** | Partitions spread across brokers |
> β οΈ But **too many partitions** can hurt performance β weβll cover trade-offs.
---
## π’ **4. How to Choose the Right Number of Partitions**
This is one of the **most important design decisions** in Kafka.
### β
**Factors to Consider**
| Factor | Guidance |
|-------|---------|
| **Throughput Needs** | More partitions = higher max throughput |
| **Consumer Parallelism** | 1 consumer per partition max (in a group) |
| **Broker Resources** | Each partition uses file handles, memory, network |
| **Rebalance Time** | More partitions = longer rebalances |
| **ZooKeeper Load** | Metadata scales with partition count |
### π **Rule of Thumb: Start Small, Scale Later**
| Use Case | Recommended Partitions |
|--------|------------------------|
| Dev / Testing | 1β3 |
| Medium App (10K msg/sec) | 6β12 |
| High Throughput (100K+ msg/sec) | 24β100+ |
> β
You can **increase partitions later**, but **never decrease**.
---
### π **Pitfall: Over-Partitioning**
Too many partitions cause:
- High memory usage on brokers.
- Slower leader elections.
- Longer consumer rebalances.
- Increased ZooKeeper load.
> π **LinkedIn recommendation**: No more than **2000 partitions per broker**.
---
## π **5. Replication: Ensuring Fault Tolerance & High Availability**
Kafka replicates partitions across brokers to **prevent data loss**.
### πΉ **Replication Factor**
Number of copies of each partition.
```bash
--replication-factor 3
```
> β
**Production: Always use 3** (1 leader + 2 followers).
With RF=3:
- One **leader** (handles reads/writes).
- Two **followers** (replicate data).
If leader fails β a follower becomes leader.
---
## π **6. In-Sync Replicas (ISR) & Leader Election**
### πΉ **What is ISR?**
**In-Sync Replicas (ISR)** are replicas that are **up-to-date** with the leader.
- If a follower falls behind (network issue), itβs removed from ISR.
- Only ISR members can become leader.
### πΉ **Example:**
```
Partition 0:
Leader: Broker 0
Replicas: [0, 1, 2]
ISR: [0, 1] β Broker 2 is lagging
```
If Broker 0 fails β **Broker 1 becomes leader** (only ISR members eligible).
> β
This prevents **data loss** during failover.
---
### βοΈ **Critical Replication Settings**
| Config | Purpose |
|-------|--------|
| `replica.lag.time.max.ms` | Max time a follower can be behind (default: 30s) |
| `min.insync.replicas` | Min ISR size for acks=all to succeed (e.g., 2) |
| `unclean.leader.election.enable` | Allow non-ISR replicas to become leader (β **set to false in prod**) |
> β
**Best Practice**:
> ```properties
> min.insync.replicas=2
> unclean.leader.election.enable=false
> ```
This ensures **durability** even during broker failures.
---
## βοΈ **7. Critical Topic-Level Configurations**
You can set configs **per topic** or globally in `server.properties`.
### πΉ **Essential Topic Configs**
| Config | Default | Meaning |
|-------|--------|--------|
| `retention.ms` | 604800000 (7 days) | How long to keep messages |
| `retention.bytes` | -1 (unlimited) | Max size per partition |
| `cleanup.policy` | delete | delete or compact |
| `segment.bytes` | 1GB | Size of log segment file |
| `max.message.bytes` | 1MB | Max message size |
| `compression.type` | producer | none, gzip, snappy, zstd |
---
### β
**Example: Create a Topic with Custom Configs**
```bash
bin/kafka-topics.sh --create \
--topic user-profiles \
--bootstrap-server localhost:9092 \
--partitions 6 \
--replication-factor 3 \
--config retention.ms=259200000 \ # 3 days
--config cleanup.policy=compact \ # Log compaction
--config segment.bytes=536870912 # 512MB segments
```
> β
This topic retains data for **3 days** and uses **log compaction**.
---
## ποΈ **8. Data Retention Policies: Time vs Size**
Kafka deletes old data based on:
- **Time**: `retention.ms`
- **Size**: `retention.bytes`
### πΉ **Retention by Time (Default)**
```properties
retention.ms=604800000 # 7 days
```
Old segments are deleted when they exceed retention.
---
### πΉ **Retention by Size**
```properties
retention.bytes=1073741824 # 1GB per partition
```
Useful when disk space is limited.
> β
You can combine both β whichever comes first wins.
---
## π§Ό **9. Log Compaction vs Log Deletion**
### πΉ **Log Deletion (Default)**
Messages are deleted after `retention.ms`.
> β
Good for **event streams** (e.g., clickstream, logs).
```
Partition: click-events
[click1] β [click2] β [click3] β [click4] β ... β DELETED after 7 days
```
---
### πΉ **Log Compaction**
Kafka keeps **only the latest value** for each key.
> β
Perfect for **state-based data** (e.g., user profiles, inventory).
#### Example:
Topic: `user-profiles` with key = `user_id`
```
Offset | Key | Value
-------|----------|-------------------
0 | user1 | {name: "Alice", status: "active"}
1 | user2 | {name: "Bob", status: "inactive"}
2 | user1 | {name: "Alice", status: "premium"} β Only this remains
```
After compaction:
- `user1`: latest value
- `user2`: latest value
> β
Consumers see **final state** of each key.
Set:
```properties
cleanup.policy=compact
```
---
### π **When to Use Which?**
| Use Case | Policy |
|--------|--------|
| Event Logs, Clickstreams | `delete` |
| User Profiles, Configs | `compact` |
| Both: Keep recent + latest | `compact,delete` |
> β
Example: `cleanup.policy=compact,delete` β compact **and** delete after 7 days.
---
## π οΈ **10. Managing Topics with Kafka CLI & Admin API**
### β
**CLI: List, Describe, Delete Topics**
```bash
# List topics
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
# Describe topic
bin/kafka-topics.sh --describe --topic user-events --bootstrap-server localhost:9092
```
Output:
```
Topic: user-events PartitionCount: 3 ReplicationFactor: 3
Topic: user-events Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,1
Topic: user-events Partition: 1 Leader: 1 Replicas: 1,2,0 Isr: 1,2
```
```bash
# Delete topic
bin/kafka-topics.sh --delete --topic old-topic --bootstrap-server localhost:9092
```
> β οΈ `delete.topic.enable=true` must be set in `server.properties`.
---
### β
**Java Admin API: Programmatic Control**
```java
import org.apache.kafka.clients.admin.*;
import java.util.*;
AdminClient admin = AdminClient.create(Collections.singletonMap(
"bootstrap.servers", "localhost:9092"
));
// Create topic
NewTopic topic = new NewTopic("new-events", 6, (short)3);
admin.createTopics(Collections.singletonList(topic));
// Describe topic
DescribeTopicsResult result = admin.describeTopics(Arrays.asList("user-events"));
Map<String, TopicDescription> desc = result.all().get();
desc.forEach((k, v) -> System.out.println(k + ": " + v));
```
> β
Use Admin API for automation and CI/CD.
---
## π **11. Reassigning Partitions & Expanding Topics**
### β
**Increase Partitions**
You can **increase** partitions (but not decrease).
```bash
bin/kafka-topics.sh --alter \
--topic user-events \
--partitions 12 \
--bootstrap-server localhost:9092
```
> β οΈ **Warning**: Changing partitions **breaks key-based ordering**.
---
### β
**Reassign Partitions (Move Data)**
When adding new brokers, redistribute partitions.
#### Step 1: Generate reassignment plan
```bash
bin/kafka-reassign-partitions.sh --generate \
--topics-to-move-json-file topics.json \
--broker-list "0,1,2,3" \
--zookeeper localhost:2181
```
`topics.json`:
```json
{
"version": 1,
"topics": [
{ "topic": "user-events" }
]
}
```
#### Step 2: Execute reassignment
```bash
bin/kafka-reassign-partitions.sh --execute --reassignment-json-file reassign.json --zookeeper localhost:2181
```
> β
This moves partitions to balance load.
---
## π **12. Monitoring Topics & Brokers**
### β
**Key Metrics to Monitor**
| Metric | Tool | Why It Matters |
|-------|------|---------------|
| **Under-replicated partitions** | `kafka-topics.sh --describe` | Indicates replication issues |
| **Active controllers** | JMX / Prometheus | Should be 1 cluster-wide |
| **Request rate** | Kafka Exporter + Grafana | Detect traffic spikes |
| **Disk usage** | `df -h` / Prometheus | Avoid running out of space |
| **Consumer lag** | `kafka-consumer-groups.sh` | Processing delays |
---
### β
**Check Under-Replicated Partitions**
```bash
bin/kafka-topics.sh --describe --under-replicated-partitions --bootstrap-server localhost:9092
```
Output:
```
Topic: user-events Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,1
```
If `Isr` has fewer than `replication.factor`, investigate!
---
## β οΈ **13. Common Pitfalls & Best Practices**
### β **Pitfall 1: Too Many Small Partitions**
- High overhead per partition.
- Slows down recovery.
β
**Fix**: Size partitions to be **1β10GB** each.
---
### β **Pitfall 2: Using `unclean.leader.election.enable=true`**
- Allows **data loss** during failover.
β
**Fix**: Set to `false` in production.
---
### β **Pitfall 3: Ignoring Disk Space**
- Kafka stores data on disk.
- Full disk β broker crashes.
β
Monitor disk and set `log.retention.check.interval.ms=300000` (5 min).
---
### β
**Best Practice 1: Use 3x Replication**
- Survives 1 broker failure.
---
### β
**Best Practice 2: Use `min.insync.replicas=2`**
- With `acks=all`, ensures durability.
---
### β
**Best Practice 3: Plan Partitions Ahead**
- Hard to change later.
- Aim for **future scalability**.
---
### β
**Best Practice 4: Use Log Compaction for State Topics**
- Avoids reprocessing all events.
---
## πΌοΈ **14. Visualizing Topic Architecture (Diagram)**
```
Topic: user-events (6 partitions, RF=3)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Partition 0: Leader=B0, Replicas=[B0,B1,B2], ISR=[B0,B1] β
β Partition 1: Leader=B1, Replicas=[B1,B2,B0], ISR=[B1,B2] β
β Partition 2: Leader=B2, Replicas=[B2,B0,B1], ISR=[B2,B0] β
β ... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β² β² β²
β β β
+-----+----+ +--+---+ +-----+-----+
| Broker 0 | |Broker 1| | Broker 2 |
| (Disk) | |(Disk) | | (Disk) |
+----------+ +--------+ +-----------+
```
> π Data is **replicated**, **partitioned**, and **durable**.
---
## π **15. Summary & Whatβs Next in Part 5**
### β
**What Youβve Learned in Part 4**
- How **partitions** enable scalability and parallelism.
- How **replication** ensures fault tolerance.
- The role of **ISR** and safe leader election.
- **Topic configurations** for retention, compaction, and performance.
- How to **manage topics** via CLI and Admin API.
- **Log compaction** vs deletion for different use cases.
- Monitoring and best practices for production.
---
### π **Whatβs Coming in Part 5: Kafka Schema Management & Serialization (Avro, Protobuf, JSON Schema)**
In the next part, weβll explore:
- π **Why Schemas Matter** in event-driven systems.
- π **Schema Evolution**: Backward, Forward, Full Compatibility.
- π οΈ **Confluent Schema Registry** setup and usage.
- π§ͺ **Avro, Protobuf, JSON Schema** comparison.
- π¦ **Serializers/Deserializers** with Schema Registry.
- π§© **Integration with Kafka Producers & Consumers**.
- π **Schema Validation & Governance**.
> π **#KafkaSchema #Avro #Protobuf #SchemaRegistry #EventSourcing #KafkaTutorial**
---
## π Final Words
Youβre now a **Kafka Storage Expert**. You understand how data is **organized, replicated, and retained** β the backbone of any reliable streaming system.
> π¬ **"A well-designed topic isnβt just scalable β itβs self-healing, durable, and future-proof."**
In **Part 5**, we tackle **data structure and schema management** β because in the real world, data isnβt just bytesβ¦ itβs **meaning**.
---
π **Pro Tip**: Document your topic designs. Include partition count, retention, and compaction policy.
π **Share this guide** with your team to build **production-grade Kafka architectures**.
---
π· **Image: Log Compaction vs Deletion**
*(Imagine two columns: one showing deletion over time, one showing compaction keeping latest per key)*
```
Log Deletion: Log Compaction:
[msg1] [user1: v1]
[msg2] [user2: v1]
[msg3] β DELETED [user1: v2] β [user1: v2]
[msg4] [user2: v2] β [user2: v2]
```
---
β
**You're now ready for Part 5!**
We're diving into **Kafka Schema Management** β the key to **evolvable, reliable, and interoperable** data systems.
#KafkaTutorial #LearnKafka #KafkaTopics #Partitioning #Replication #DataRetention #LogCompaction #ApacheKafka #DataEngineering #StreamingPlatform