RT 293074 - HackMD

# RT 293074 ## User inquiry From: "..." <...@medsci.uu.se> Dear support services, I am encountering a memory issue while attempting to align two metabolomics liquid chromatography-mass spectrometry measurements for SIMPLER. Using the metabCombiner R function (https://www.bioconductor.org/packages/devel/bioc/vignettes/metabCombiner/inst/doc/metabCombiner_vignette.html) to group features from both datasets by m/z and construct a metabCombiner object on the fat node of Bianca (UPPMAX) with a memory of 512 GB is insufficient for handling the large number of metabolic features and participants. The dataset characteristics are as follows: SMC characteristics: - Total Samples: 4982 - Feature Counts: 3474 COSM characteristics: - Total samples: 7708 - Feature Counts: 6052 I was, however, able to perform the function successfully for a smaller subset of data: SMC characteristics: - Total Samples: 4982 - Feature Counts: 1696 COSM characteristics: - Total samples: 7708 - Feature Counts: 6052 I kindly request your assistance in finding a solution. Is it possible to allocate more memory? Alternatively, can we explore other resources to accommodate the full dataset? Best regards, ... ## Reply draft Dear ..., I have discussed your inquiry with my colleagues. Here come some questions, comments, and suggestions. Are you running this simulation using the RStudio GUI or do you run it as an R script? Are both solutions feasible for your simulation(s)? How many simulations do you need to perform? Do you have an estimate for the memory and compute time needed for the simulation(s)? Some alternatives for decreasing the RAM usage would be: - work with sub-sets of data at a time and combine the results - write partial results to file instead of keeping it in memory - deallocate variables no longer needed from memory, possibly using the rm R function These alternatives may not be feasible, it really depends on your data and I don't know much about metabCombiner to say what would be appropriate. The developers of metabCombiner migth be able to provide some recommendations. You may find Hani Habra's email at https://www.bioconductor.org/packages/release/bioc/html/metabCombiner.html. Are the input files required for the simulation in one directory? One possibility would be that the system experts run your simulation(s) on another cluster which has a few TB of RAM. In order to ensure the data remains secure, it would have to be done by them. This would be easier if the data is confined to a few directories that can be securely copied to that cluster. Another alternative would be to use the swap memory local to the compute node. This alternative requires compute node re-configuration to be done by the system experts. Using the swap memory in addition to RAM will lead to performance degradation, in other words, you simulation will run much slower. Best regards, Diana, UPPMAX application expert