# Apache Kafka
> Reference: apache-kafka
## Topics, Partitions, Offsets

- 資料只要一被寫進去partition中, 就不能被更改 (immutability)
- 資料只能保存一段時間(預設一周)
- Offset只是一個存在特定partition的意義而已
- 只有在同一個partition才有順序
- 資料被隨機的送到partition中, 除非你有指定key
- 每一個topic可以有很多個partition
## Producers
- 負責將資料寫入topic中 (topic被為partition組成)
- 知道要往哪一個partition寫
- 萬一broker壞掉, 他會自動復原

#### Message Keys
- Producer 可以將一個key跟著message寫入topic
- key=null, round robin (0, 1, 2...)
- key!=null, 這個key的所有訊息將始終發送到同一個分區
- 如果您需要特定欄位的訊息排序,通常會傳送一個key

#### Messages
- Message created by the producer

#### Serializer
- kafka只接受從producer送進來的bytes的input& output
- 所以他message就必須要通過serialization(把資料轉成bytes)
- 用在key value上

### Consumers
- 從topic中讀資料出來
- 知道從哪一個broker讀資料出來
- 萬一broker壞掉, 他知道怎麼復原
- 資料從低到高讀取(offset), 這個順序只存在partitions

#### Deserializer
- 把資料deserialize to obj/data
- 用在key value上
- Type cannot be change during lifecycle

#### Consumers groups
- Consumers在應用中會是一個groups
- 每一個consumer group 會去讀特定的partition
- 同一個topic可以有多個consumer groups去讀資料(用group.id區分group)


#### Consumers offset
- Kafka儲存offset來知道說consumer讀到哪了
- brocker會週期地commit offset把他儲存在****consumer_offsets****
- 如果consumer死掉, 他會去找他剛剛commit到哪裡

#### Delivery semantic
- At least once (JAVA perferred)
- Commited after processed
- Processing goes wrong, message will read again
- Duplicate processing?
- At most once
- Commited as received
- Processing goes wrong, message will be lost
- Exactly once
- Kafka workflows: Transactional API
- External System workflows: Idempotent consumers
### Brokers
- Kafka群集由多個broker組成
- 每一個broker用ID來區分
- 每一個broker有特定的topic partitions
- 你只要連到任意一個broker (bootstrap server), 就等於你連到一整個cluster
- 3~ brokers is good


#### Brokers and topics
- Topic-A with 3 partitions
- Topic-B with 2 partitions

### Topic replication
- Replication factor > 1 (must)
- 可以壞N-1個Broker並且恢復資料
- If a broker is down, another broker can serve the data

#### Concept of Leader for a Partition
- 任何時間, 只有一個broker可以當其中一個的partition leader
- Producers 只能把資料送到對應broker中的leader of a partition
- 其他broker會開始抄寫資料
- 因此每一個partition會有一個leader + 多個ISR

- Producer & Consumers 預設讀寫行為 => 都從leader來

> Note: Kafka 2.4 可以config成從最近的replica開始讀用以改善latency
### Producer Acknowledgements(acks)
- Producers可以選擇接收ack的policy
- acks=0: Producers不等ack (possible data loss)
- acks=1: Producer等ack (limited data loss)
- acks=all: Leader+replicas ACK (no data loss)

### Zookeeper
- 管理Brokers, 有一個清單管理他們
- 在分區中選出leader
- 寄出通知給Kafka
- Zooker不存offset, 給kafka存
- Odd number of servers(1,3,5,7)
- A leader(writes), the rest of servers are followers(reads)
> KIP-500, Kafka 3.X之後可以用Raft
## Kafka CLI
### Installation
```shell=
brew install kafka
/usr/local/bin/zookeeper-server-start /usr/local/etc/zookeeper/zoo.cfg
/usr/local/bin/kafka-server-start /usr/local/etc/kafka/server.properties
```

### Topic CLI
```shell=
kafka-topics --bootstrap-server localhost:9092 --topic first_topic --create
kafka-topics --bootstrap-server localhost:9092 --topic second_topic --create --partitions 3
kafka-topics --bootstrap-server localhost:9092 --topic third_topic --create --partitions 3 --replication-factor 2
# replication數量不能大於broker
kafka-topics --bootstrap-server localhost:9092 --topic third_topic --create --partitions 3 --replication-factor 1
kafka-topics --bootstrap-server localhost:9092 --list
kafka-topics --bootstrap-server localhost:9092 --describe
kafka-topics --bootstrap-server localhost:9092 --topic second_topic --delete
```
### Producers CLI

不存在


localhhost會自動建立


### Consumer CLI




