Josh: would be good to know how much of this could be moved to either
Comments from Petr for integrating:
A summarised version of https://github.com/IDR/SubmissionWorkflow/blob/master/zarr.md (private) that will be used to construct a workflow.
Input parameters:
IMAGE_ID
: OMERO Image IDIMAGE_PATH
: Path to raw data on NFSOUTPUT_DIR
: Output parent directory for ZarrsTemporary files passed between steps:
SERIES_CSV
: CSV including image IDsROIS_CSV
: CSV including ROI IDsConda Environment:
Current workflow:
${IMAGE_ID}
, save to ${SERIES_CSV}:omero hql --style=plain -q "select i.id, i.series from Image i where i.fileset.id = (select i.fileset.id from Image i where i.id = ${IMAGE_ID}) order by i.series asc" | cut -f2,3 -d, | tee ${SERIES_CSV}
bioformats2raw --file_type=zarr ${SERIES_CSV} ${IMAGE_PATH} ${OUTPUT_DIR}
curl -o- http://idr.openmicroscopy.org/webclient/imgData/${IMAGE_ID}/ > ${OUTPUT_DIR}/${IMAGE_ID}.zarr/omero.json
(Presumably need to loop through all image IDs in ${SERIES_CSV}
)
merge.py
: https://github.com/IDR/idr-zarr-tools/blob/master/merge.py
./merge.py {OUTPUT_DIR}/${IMAGE_ID}.zarr
(Should we delete ${OUTPUT_DIR}/${IMAGE_ID}.zarr/omero.json
?)
omero hql --style=plain "select distinct s.textValue, s.roi.id from Shape s where s.roi.image.id = ${IMAGE_ID}" --limit=-1 | tee ${ROIS_CSV}
(Presumably need to loop through all image IDs in ${SERIES_CSV}
)
omero zarr masks Image:${IMAGE_ID} --mask-map=${ROIS_CSV}
Looks like steps 3-6 should be in a loop over the output of step 2. One question is whether to keep the loop in the same workflow, or to branch into a different workflow.
Based on https://github.com/ome/omero-cli-zarr/pull/29#issuecomment-713516616 and https://github.com/ome/omero-cli-zarr/pull/38#pullrequestreview-518083982
Or use
if you have:
Can run from your home directory…
NB: Paste in this docker file, editing the omero-cli-zarr branch you want to use:
Create a companion export.omero file
And add export commands to connect to idr-testing…
Then build with
We want files to end up at e.g. (named after current or last-merged PR) https://minio-dev.openmicroscopy.org/idr/idr0033-rohban-pathways/41744_illum_corrected/pr35/5966.zarr Location is on /uod/idr/objectstore/minio/idr/ but this is now accessible from ome-zarr-dev1.openmicroscopy.org.
Make sure target location exists…
Want to run with your own user ID, so you can modify output files
NB: had some permissions problems (fixed) but the session created when the Dockerfile was built above then expired, so needed to run the container interactively and re-login to idr:
Should now be able to see files appearing on idr0-slot3 e.g. to see how many Wells have been exported:
Export and then downsample, working with Temp state of PR: https://github.com/ome/ome-zarr-py/pull/71#issuecomment-759404371
Dockerfile:
Create an export_zarr
bash script:
Build and run
Use:
Or with this config:
Use:
First install Conda. Using first link in list at https://docs.conda.io/en/latest/miniconda.html#linux-installers
After re-opening a Terminal:
Using environment.yml
at https://github.com/ome/NGFF-ELMI-2021-Workshop/blob/main/binder/environment.yml …
Seemed to work:
Activate and install version of omero-cli-zarr we want to use
Available at:
Ended with:
Seb: looks like we have used all the Inodes on the objectstore GPFS fileset at UoD so options are 1- clean up data on GPFS, 2- increase the number of inodes, 3- use EBI S3 rather than our minio
Others:
{"bioformats2raw.layout" : 3}
Checkout https://github.com/hms-dbmi/vizarr/pull/43
$ cd vizarr $ npm run export # to /out directory
Using pilot-zarr1-dev (ngff data directory: /data
) or pilot-zarr2-dev (/data/ngff
).
Install bioformats2raw via conda:
This is actually just for getting the dependencies installed. Get the actual bioformats2raw from this PR and just unzip it into your home directory: https://github.com/IDR/bioformats2raw/pull/1
Create a directory for the idr project and memo files (if it's not already there), and change into the idr directory. For example for idr0051:
Find out where the pattern, screen or companion files are. For example:
/nfs/bioimage/drop/idr0051-fulton-tailbudlightsheet/patterns/
Then run the conversion (using the bioformat2raw from the PR!):
($i
is the pattern file, ${i%.*}.ome.zarr
strips the .pattern file extension and adds .ome.zarr; this should work for pattern, screen and also companion file extensions)
Install minio and configure…
Also installed at on idrtesting-pilot to allow direct upload without conversion…
See https://forum.image.sc/t/converting-other-idr-images-into-public-zarr/44025/12 Want to convert /uod/idr/filesets/idr0048-abdeladim-chroms/20181217-ftp/Astrop65_BDV/ (astroP65.h5 (73GB) and astroP65.xml)
Seb: "pilot-zarr1-dev
or pilot-zarr2-dev
that has the advantage of being on EBI Embassy (so easier to upload to S3)"
Installed conda and bioformats2raw as above in /home/wmoore/miniconda
, and installed & configured mc
client as above.
conda install
Rsync local data to minio:
Seb: set the policy on the bucket as described in https://github.com/IDR/deployment/blob/master/docs/object-store.md#policy ? this will be mandatory for accessing it from the pilots (without keys)
Will: also need to configure CORS as described there too.
Update - April 2025
Minio hosted at ome-dckr-ap1
which is at…
NB: Dom had to move the data there as when I put it at
/uod/idr/objectstore/minio/idr/ome-ngff-tools/sample_files.zarr/
it wasn't accessible via minio!
NB: tried to create a new top-level bucket (sibling to idr) but couldn't fix CORS…
… as I don't know aws credentials!