General info
Materials for the talk: https://github.com/simo-tuomisto/data-format-tests
This is the third event in the Nordic RSE seminar series.
In computing, I/O bandwidth is just as much of a consumable resource as CPU and memory. While on an individual scale on one's own computer, this is often not the most pressing consideration, on a cluster with shared storage (or very intensive individual projects) it is actually very important to consider. This talk will present lessons and tools that RSEs should have in their toolbox, as we have learned at Aalto Scientific Computing over the years. Expected schedule: ~15 minute introductions, ~45 minutes hands-on presentation, ~30 minutes discussion
The "I hope I didn't forget anything" problem refers mainly to pre-processing data to only submit the relevant parts, right?
But wouldn't it make sense, to convert your data into formats used for visualisation during the visualisation step?
Do you have a code example or tutorial and how to do split data into shards for Pytorch? I assume that it might require some work to make it work with the "load_item" paradigm used in Pytorch datasets.
This is how I ask a question?