# Data Engineering Challenge
## Challenge 1: System Design
Consider you have to create a service to process event data in JSON format:
```
{
"event_ts": "yyyy-mm-dd hh:mm:ss",
"event_source": "source_name",
"event_name": "my_event",
"event_data": {...}
}
```
You can receive different types of events:
* Events triggered by frontend & backend software (e.g., *new user account created*).
* Incoming events from database triggers (e.g., *insert, updates, deletes from different tables*).
* etc
**You have to help designing a data platform capable of allowing data analysts and scientists to answer business questions.**
* Assuming your input is consistently these JSON-based events. How would you design the data platform?
* What architectural components will you consider? (e.g., *how to store & process the data? what services would you build/implement?*)
* (Bonus) How would you handle changes in the schema?
## Challenge 2: Python Coding
----
Recommended online python interpreter:
* https://www.online-python.com/
* https://www.pythonanywhere.com/try-ipython/
:::warning
Consider saving your solution in an external file to avoid data loss.
:::
----
Assume that you are given a Python dictionary that can only contain the following data types:
* Strings (keys)
* Integers (vals)
* Dictionaries (vals)
* Cannot be empty.
Example of a compliant dictionary:
```jsonld=
my_dictionary = {
"a": 1,
"b": 2,
"c": {"d": 3, "e": 4, "f": 5}
}
```
* Note: The inner dict can be nested and contain any valid data type.
**Challenge**: Create a function called `flatten` that returns the flat version of a dictionary (e.g., all the keys collaped on the top-level). Example:
Input:
```text
{
"a": 1,
"b": {
"c": 2,
"d": 3
}
}
```
Output:
```text
{
"a": 1,
"b.c": 2,
"b.d": 3
}
```
---
Imports and helper functions:
```python=
import json
def flatten(dictionary: dict):
# TODO: Implement this fuction
return dictionary
def display(dictionary: dict):
print(json.dumps(dictionary, indent=4))
```
Test cases:
`Test 1`: The solution should work fine with an already flatten dictionary.
```
test_1 = {
"a": 1,
"b": 2,
"c": 3
}
res_1 = flatten(test_1)
display(res_1)
```
```
{
"a": 1,
"b": 2,
"c": 3
}
```
`Test 2`: The solution should work with nested directionaries (at least one level depth).
```
test_2 ={
"a": 1,
"b": {
"c": 2,
"d": 3
}
}
res_2 = flatten(test_2)
display(res_2)
```
```
{
"a": 1,
"b.c": 2,
"b.d": 3
}
```
`Test 3`: The solution should work for any arbitrary number of nested dictionaries.
```text=
test_3 = {
"a": 1,
"b": {
"c": {
"d": 2
},
},
"e": {
"f": 3,
"g": 4
}
}
res_3 = flatten(test_3)
show(res_3)
```
```text=
{
"a": 1,
"b.c.d": 2,
"e.f": 3,
"e.g": 4
}
```
## Challenge 3
```
"""
Given a tree structure like this:
Example 1:
3
/ | \
1 5 10
/ / \ \
6 1 4 5
Example 2:
1
/ |
1 20
/ \
10 6
Example 3:
7
|
15
/ | \
1 6 4
Calculate the sum per level:
3 => 3
/ | \
1 5 10 => 16
/ / \ \
6 1 4 5 => 16
Note: you can have unbalanced trees!
a) Create a datastructure that allows you to represent a generic tree similar to the provided example.
* Use an OOP approach (create a class to model this problem).
* Provide an instance of your data structure.
b) Complete the following function such that you get the sum per level.
* Test the `sum_levels` function over your instance.
* You can print it out or return a dictionary.
* Return example: {0: 3, 1: 16, 2:16}
def sum_levels(tree):
pass
"""
import json
# A) CREATE A DATASTRUCTURE TO REPRESENT THE PROBLEM
tree_instance = None
# B) IMPLEMENT THE sum_level FUNCTION.
def sum_levels(tree):
# Complete this function with your solution
pass
res = sum_levels(tree=tree_instance)
print(json.dumps(res, indent=4))
```