Collaborative writing start from below
從這裡開始共筆
Modern data processing:
Goal:
The hero of this talk: Apache Beam
"Table" - becomes -> "Collection"
If you ask this question, it's likely better for you to use cloud solutions.
From the three goals earlier, here we are talking about "correctness". So how correct you want to be? Say you have to be super correct, so you have to sacrificie something, e.g. it will take longer time. In the mentioned papers there are many scenarioes to think about this.
We use the whiteboard. Write down what data I have and what format. And just do that. This part is pretty one-off and have to think carefully. Once it's done, you can implement it. Unfortunately no other suggestions to just "solve" this.
My company has many teams that are working on pipeline optimisation. No current general solution. Some thoughts though: can some data be dropped earlier, so less to process, but no very generic stuff.
Below is the part that speaker updated the talk/tutorial after speech
講者於演講後有更新或勘誤投影片的部份