# Apache Kafka > Reference: apache-kafka ## Topics, Partitions, Offsets ![image](https://hackmd.io/_uploads/BJls2n4op.png) - 資料只要一被寫進去partition中, 就不能被更改 (immutability) - 資料只能保存一段時間(預設一周) - Offset只是一個存在特定partition的意義而已 - 只有在同一個partition才有順序 - 資料被隨機的送到partition中, 除非你有指定key - 每一個topic可以有很多個partition ## Producers - 負責將資料寫入topic中 (topic被為partition組成) - 知道要往哪一個partition寫 - 萬一broker壞掉, 他會自動復原 ![Screenshot 2024-02-13 at 2.15.23 PM](https://hackmd.io/_uploads/Sy_ZtYdsp.png) #### Message Keys - Producer 可以將一個key跟著message寫入topic - key=null, round robin (0, 1, 2...) - key!=null, 這個key的所有訊息將始終發送到同一個分區 - 如果您需要特定欄位的訊息排序,通常會傳送一個key ![Screenshot 2024-02-13 at 2.21.36 PM](https://hackmd.io/_uploads/BJ2d9Fusa.png) #### Messages - Message created by the producer ![Screenshot 2024-02-13 at 2.25.46 PM](https://hackmd.io/_uploads/ryDujYOoa.png) #### Serializer - kafka只接受從producer送進來的bytes的input& output - 所以他message就必須要通過serialization(把資料轉成bytes) - 用在key value上 ![Screenshot 2024-02-13 at 2.29.44 PM](https://hackmd.io/_uploads/BkBvhF_sp.png) ### Consumers - 從topic中讀資料出來 - 知道從哪一個broker讀資料出來 - 萬一broker壞掉, 他知道怎麼復原 - 資料從低到高讀取(offset), 這個順序只存在partitions ![Screenshot 2024-02-13 at 2.31.53 PM](https://hackmd.io/_uploads/SkOk6F_oT.png) #### Deserializer - 把資料deserialize to obj/data - 用在key value上 - Type cannot be change during lifecycle ![Screenshot 2024-02-13 at 2.34.23 PM](https://hackmd.io/_uploads/HJkYTYui6.png) #### Consumers groups - Consumers在應用中會是一個groups - 每一個consumer group 會去讀特定的partition - 同一個topic可以有多個consumer groups去讀資料(用group.id區分group) ![Screenshot 2024-02-13 at 2.37.01 PM](https://hackmd.io/_uploads/SyaMRY_iT.png) ![Screenshot 2024-02-13 at 2.42.41 PM](https://hackmd.io/_uploads/BJpwkqOop.png) #### Consumers offset - Kafka儲存offset來知道說consumer讀到哪了 - brocker會週期地commit offset把他儲存在****consumer_offsets**** - 如果consumer死掉, 他會去找他剛剛commit到哪裡 ![Screenshot 2024-02-13 at 2.48.23 PM](https://hackmd.io/_uploads/HkzTeqOsp.png) #### Delivery semantic - At least once (JAVA perferred) - Commited after processed - Processing goes wrong, message will read again - Duplicate processing? - At most once - Commited as received - Processing goes wrong, message will be lost - Exactly once - Kafka workflows: Transactional API - External System workflows: Idempotent consumers ### Brokers - Kafka群集由多個broker組成 - 每一個broker用ID來區分 - 每一個broker有特定的topic partitions - 你只要連到任意一個broker (bootstrap server), 就等於你連到一整個cluster - 3~ brokers is good ![Screenshot 2024-02-13 at 2.57.35 PM](https://hackmd.io/_uploads/H1cJX9doT.png) ![Screenshot 2024-02-13 at 3.07.09 PM](https://hackmd.io/_uploads/SJ_NBc_op.png) #### Brokers and topics - Topic-A with 3 partitions - Topic-B with 2 partitions ![Screenshot 2024-02-13 at 2.57.02 PM](https://hackmd.io/_uploads/ry5af5_sp.png) ### Topic replication - Replication factor > 1 (must) - 可以壞N-1個Broker並且恢復資料 - If a broker is down, another broker can serve the data ![Screenshot 2024-02-13 at 2.58.55 PM](https://hackmd.io/_uploads/r1nEXcui6.png) #### Concept of Leader for a Partition - 任何時間, 只有一個broker可以當其中一個的partition leader - Producers 只能把資料送到對應broker中的leader of a partition - 其他broker會開始抄寫資料 - 因此每一個partition會有一個leader + 多個ISR ![Screenshot 2024-02-13 at 3.01.37 PM](https://hackmd.io/_uploads/rkykVcdip.png) - Producer & Consumers 預設讀寫行為 => 都從leader來 ![Screenshot 2024-02-13 at 3.02.39 PM](https://hackmd.io/_uploads/r13fVqOj6.png) > Note: Kafka 2.4 可以config成從最近的replica開始讀用以改善latency ### Producer Acknowledgements(acks) - Producers可以選擇接收ack的policy - acks=0: Producers不等ack (possible data loss) - acks=1: Producer等ack (limited data loss) - acks=all: Leader+replicas ACK (no data loss) ![Screenshot 2024-02-13 at 3.05.49 PM](https://hackmd.io/_uploads/SycAVqusa.png) ### Zookeeper - 管理Brokers, 有一個清單管理他們 - 在分區中選出leader - 寄出通知給Kafka - Zooker不存offset, 給kafka存 - Odd number of servers(1,3,5,7) - A leader(writes), the rest of servers are followers(reads) > KIP-500, Kafka 3.X之後可以用Raft ## Kafka CLI ### Installation ```shell= brew install kafka /usr/local/bin/zookeeper-server-start /usr/local/etc/zookeeper/zoo.cfg /usr/local/bin/kafka-server-start /usr/local/etc/kafka/server.properties ``` ![Screenshot 2024-02-14 at 4.21.38 PM](https://hackmd.io/_uploads/BJJmue5o6.png) ### Topic CLI ```shell= kafka-topics --bootstrap-server localhost:9092 --topic first_topic --create kafka-topics --bootstrap-server localhost:9092 --topic second_topic --create --partitions 3 kafka-topics --bootstrap-server localhost:9092 --topic third_topic --create --partitions 3 --replication-factor 2 # replication數量不能大於broker kafka-topics --bootstrap-server localhost:9092 --topic third_topic --create --partitions 3 --replication-factor 1 kafka-topics --bootstrap-server localhost:9092 --list kafka-topics --bootstrap-server localhost:9092 --describe kafka-topics --bootstrap-server localhost:9092 --topic second_topic --delete ``` ### Producers CLI ![Screenshot 2024-02-14 at 4.41.03 PM](https://hackmd.io/_uploads/r1as2gqia.png) 不存在 ![Screenshot 2024-02-14 at 4.41.26 PM](https://hackmd.io/_uploads/HJ4Tne9ja.png) ![Screenshot 2024-02-14 at 4.42.02 PM](https://hackmd.io/_uploads/B1F16g5j6.png) localhhost會自動建立 ![Screenshot 2024-02-14 at 4.42.55 PM](https://hackmd.io/_uploads/Syjf6eqip.png) ![Screenshot 2024-02-14 at 4.47.06 PM](https://hackmd.io/_uploads/r1KfRgqop.png) ### Consumer CLI ![Screenshot 2024-02-14 at 4.58.39 PM](https://hackmd.io/_uploads/BkxAgZ9jT.png) ![Screenshot 2024-02-14 at 4.59.39 PM](https://hackmd.io/_uploads/HJtbZZ5s6.png) ![Screenshot 2024-02-14 at 5.01.29 PM](https://hackmd.io/_uploads/ByHd--9jp.png) ![Screenshot 2024-02-14 at 5.04.09 PM](https://hackmd.io/_uploads/Hk8zzWqoT.png) ![Screenshot 2024-02-14 at 5.07.16 PM](https://hackmd.io/_uploads/SJZ0fW5sa.png)