--- title: Practical Project - 02 tags: data-engineer --- # Overview This exercise helps you know how to use Kafka to ingest streaming data and process these data to detect anomaly. **Level:** `Intermediate` **Estimated time**: 15 hours **Prerequisites:** - Understand Kafka basic concepts - Have basic knowledge of programming - Know how to run Docker (since we suggest you fire up your own Kafka cluster (one node is good to go) in your local machine) # Requirements In this exercise, you will build a simple system to process and detect abnormal transactions of bank customer, as follows: ![](https://trello-attachments.s3.amazonaws.com/5ed87460fa64a11bc8eb0883/600e965d151383601fc2e05c/1cf6172c0f6a7a5a3d48bd4ac9d0aae9/exe2-flow.png) There are two main tasks you have to do: 1. Build a data generator, which should: - Produce a streaming transaction data, realtime-alike (100 transactions per second) - Produce data has the following structure: ```json= { "transactionId": "93151357815SJFHB", "accountId": "19084637648936", "customerId": "0931573195", // each customer can have multiple accounts "targetAccountId": "8913850315984579", "serviceId": "8356", "amount": "10500000", "currency": "VND" } ``` - Publish data to a Kafka topic 2. Build a abnormal detection, which should: - Read transaction data in Kafka then process these data to detect abnormal transaction - Write the abnormal transaction to a different topic in Kafka **How to label a transaction as abnormal?** - Any transaction whose amount is more than 200 millions dong. - Any account which takes more than 10 transactions per minutes with total amount more than 200 millions dong. - Any customer produces more than 20 transaction per minutes with total amount more than 200 millions dong. # Guide <p> <details> <summary>Click this to collapse/fold.</summary> > Hints </details> </p>