# Data Engineering Challenge ## Challenge 1: System Design Consider you have to create a service to process event data in JSON format: ``` { "event_ts": "yyyy-mm-dd hh:mm:ss", "event_source": "source_name", "event_name": "my_event", "event_data": {...} } ``` You can receive different types of events: * Events triggered by frontend & backend software (e.g., *new user account created*). * Incoming events from database triggers (e.g., *insert, updates, deletes from different tables*). * etc **You have to help designing a data platform capable of allowing data analysts and scientists to answer business questions.** * Assuming your input is consistently these JSON-based events. How would you design the data platform? * What architectural components will you consider? (e.g., *how to store & process the data? what services would you build/implement?*) * (Bonus) How would you handle changes in the schema? ## Challenge 2: Python Coding ---- Recommended online python interpreter: * https://www.online-python.com/ * https://www.pythonanywhere.com/try-ipython/ :::warning Consider saving your solution in an external file to avoid data loss. ::: ---- Assume that you are given a Python dictionary that can only contain the following data types: * Strings (keys) * Integers (vals) * Dictionaries (vals) * Cannot be empty. Example of a compliant dictionary: ```jsonld= my_dictionary = { "a": 1, "b": 2, "c": {"d": 3, "e": 4, "f": 5} } ``` * Note: The inner dict can be nested and contain any valid data type. **Challenge**: Create a function called `flatten` that returns the flat version of a dictionary (e.g., all the keys collaped on the top-level). Example: Input: ```text { "a": 1, "b": { "c": 2, "d": 3 } } ``` Output: ```text { "a": 1, "b.c": 2, "b.d": 3 } ``` --- Imports and helper functions: ```python= import json def flatten(dictionary: dict): # TODO: Implement this fuction return dictionary def display(dictionary: dict): print(json.dumps(dictionary, indent=4)) ``` Test cases: `Test 1`: The solution should work fine with an already flatten dictionary. ``` test_1 = { "a": 1, "b": 2, "c": 3 } res_1 = flatten(test_1) display(res_1) ``` ``` { "a": 1, "b": 2, "c": 3 } ``` `Test 2`: The solution should work with nested directionaries (at least one level depth). ``` test_2 ={ "a": 1, "b": { "c": 2, "d": 3 } } res_2 = flatten(test_2) display(res_2) ``` ``` { "a": 1, "b.c": 2, "b.d": 3 } ``` `Test 3`: The solution should work for any arbitrary number of nested dictionaries. ```text= test_3 = { "a": 1, "b": { "c": { "d": 2 }, }, "e": { "f": 3, "g": 4 } } res_3 = flatten(test_3) show(res_3) ``` ```text= { "a": 1, "b.c.d": 2, "e.f": 3, "e.g": 4 } ``` ## Challenge 3 ``` """ Given a tree structure like this: Example 1: 3 / | \ 1 5 10 / / \ \ 6 1 4 5 Example 2: 1 / | 1 20 / \ 10 6 Example 3: 7 | 15 / | \ 1 6 4 Calculate the sum per level: 3 => 3 / | \ 1 5 10 => 16 / / \ \ 6 1 4 5 => 16 Note: you can have unbalanced trees! a) Create a datastructure that allows you to represent a generic tree similar to the provided example. * Use an OOP approach (create a class to model this problem). * Provide an instance of your data structure. b) Complete the following function such that you get the sum per level. * Test the `sum_levels` function over your instance. * You can print it out or return a dictionary. * Return example: {0: 3, 1: 16, 2:16} def sum_levels(tree): pass """ import json # A) CREATE A DATASTRUCTURE TO REPRESENT THE PROBLEM tree_instance = None # B) IMPLEMENT THE sum_level FUNCTION. def sum_levels(tree): # Complete this function with your solution pass res = sum_levels(tree=tree_instance) print(json.dumps(res, indent=4)) ```