--- tags: seminar series title: I/O monitoring and optimization --- :::danger **General info** - **Video connection details:** - Zoom ID: 662 0907 5434 - Zoom password: rse - Zoom invite link: https://uwasa.zoom.us/j/66209075434?pwd=VmRBaFRVOXNKNFRYb1NDRGY5SXZndz09 - **Contact:** - **Date and time**: Tuesday, October 26th 2021 13:00 CEST - **This page:** https://hackmd.io/@nordic-rse/io-monitoring-optimization ::: # I/O monitoring and optimization Materials for the talk: https://github.com/simo-tuomisto/data-format-tests ## Icebreaker question - What kind of IO problems have you experienced in the past? - Taking a long time to write a simulation state to a lustre disk (but we just did it less often rather than worry about it) - Slowdown of code by multiple re-reading of data ## About the series This is the third event in the Nordic RSE seminar series. * Reminder about starting recording * Find out about future events: * Check https://nordic-rse.org/events/seminar-series/. * Previous seminar talks videos available at [Youtube channel](https://www.youtube.com/channel/UC8OyVrmJEuT2lrH7zXoBrhQ) * Follow [@nordic_rse](https://twitter.com/nordic_rse) on Twitter for announcements * Join the [Nordic RSE stream](https://coderefinery.zulipchat.com/#narrow/stream/213720-nordic-rse) of the CodeRefinery chat * Suggest speakers: * on the [Nordic RSE stream](https://coderefinery.zulipchat.com/#narrow/stream/213720-nordic-rse) * by creating an issue on the [Nordic RSE website repository](https://github.com/nordic-rse/nordic-rse.github.io/issues) ## About the Nordic RSE * Represents Research Software Engineers in the Nordics. * Check out [nordic-rse.org](https://nordic-rse.org/) for other activities. * Registering as an association this fall. * To become a member, fill in the [membership form](https://forms.gle/qCVVRGXPi3Hq7inW6). ## Speaker: Simo Tuomisto - System specialist at Aalto University ## Abstract In computing, I/O bandwidth is just as much of a consumable resource as CPU and memory. While on an individual scale on one's own computer, this is often not the most pressing consideration, on a cluster with shared storage (or very intensive individual projects) it is actually very important to consider. This talk will present lessons and tools that RSEs should have in their toolbox, as we have learned at Aalto Scientific Computing over the years. Expected schedule: ~15 minute introductions, ~45 minutes hands-on presentation, ~30 minutes discussion ## Ask your questions here * The "I hope I didn't forget anything" problem refers mainly to pre-processing data to only submit the relevant parts, right? * But wouldn't it make sense, to convert your data into formats used for visualisation during the visualisation step? * Do you have a code example or tutorial and how to do split data into shards for Pytorch? I assume that it might require some work to make it work with the "load_item" paradigm used in Pytorch datasets. * This is work in progress, but maybe have a look at https://github.com/AaltoRSE/ImageNetTools * Also this hackMD might give some hints: https://hackmd.io/_1A94N_0RN2FCnCMGD2FpA * and the direct link to webdataset https://github.com/webdataset/webdataset * This is how I ask a question? * Yes it is * And this is another comment * Everyone can ask and answer questions :)