Try   HackMD

General info

I/O monitoring and optimization

Materials for the talk: https://github.com/simo-tuomisto/data-format-tests

Icebreaker question

  • What kind of IO problems have you experienced in the past?
    • Taking a long time to write a simulation state to a lustre disk (but we just did it less often rather than worry about it)
    • Slowdown of code by multiple re-reading of data

About the series

This is the third event in the Nordic RSE seminar series.

About the Nordic RSE

  • Represents Research Software Engineers in the Nordics.
  • Check out nordic-rse.org for other activities.
  • Registering as an association this fall.

Speaker: Simo Tuomisto

  • System specialist at Aalto University

Abstract

In computing, I/O bandwidth is just as much of a consumable resource as CPU and memory. While on an individual scale on one's own computer, this is often not the most pressing consideration, on a cluster with shared storage (or very intensive individual projects) it is actually very important to consider. This talk will present lessons and tools that RSEs should have in their toolbox, as we have learned at Aalto Scientific Computing over the years. Expected schedule: ~15 minute introductions, ~45 minutes hands-on presentation, ~30 minutes discussion

Ask your questions here

  • The "I hope I didn't forget anything" problem refers mainly to pre-processing data to only submit the relevant parts, right?

  • But wouldn't it make sense, to convert your data into formats used for visualisation during the visualisation step?

  • Do you have a code example or tutorial and how to do split data into shards for Pytorch? I assume that it might require some work to make it work with the "load_item" paradigm used in Pytorch datasets.

  • This is how I ask a question?

    • Yes it is
      • And this is another comment
      • Everyone can ask and answer questions :)