Planemo run experience

# Planemo run experience Feedback on my experience with running workflows on public Galaxy instances with Planemo. Link to the docs : https://planemo.readthedocs.io/en/latest/running.html#the-basics Tutorial : https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/workflow-automation/tutorial.html ## Command lines ### Planemo run **Warning**: Create output dir beforehand! #### Testing ````bash! planemo run Galaxy-Workflow-Testing.ga testing.yml --engine external_galaxy --galaxy_url https://usegalaxy.eu/ --galaxy_user_key $EUKEY --download_outputs --output_directory test_results2/ --output_json output.json & ```` Trying with workflow id ````bash! planemo run 906e9eaf45fa9782 testing.yml --engine external_galaxy --galaxy_url https://usegalaxy.eu/ --galaxy_user_key $EUKEY --download_outputs --output_directory test_results2/ --output_json output.json & ```` #### Files Here is the content of the files I used to test `planemo run`. <details> <summary>test-data/Busco Summary Hap1.txt</summary> ```` # BUSCO version is: 5.3.2 # The lineage dataset is: vertebrata_odb10 (Creation date: 2021-02-19, number of genomes: 67, number of BUSCOs: 3354) # Summarized benchmarking in BUSCO notation for file /scratch4/nekrut/galaxy/main/staging/52665917/inputs/dataset_faca886a-1e96-4793-ba6e-a9b5f4e64600.dat # BUSCO was run in mode: genome # Gene predictor used: metaeuk ***** Results: ***** C:1.0%[S:1.0%,D:0.0%],F:0.4%,M:98.6%,n:3354 36 Complete BUSCOs (C) 35 Complete and single-copy BUSCOs (S) 1 Complete and duplicated BUSCOs (D) 15 Fragmented BUSCOs (F) 3303 Missing BUSCOs (M) 3354 Total BUSCO groups searched Dependencies and versions: hmmsearch: 3.1 metaeuk: 5.34c21f2 ```` </details> <details> <summary>testing.yml</summary> ````yaml File to head: class: File path: test-data/Busco Summary Hap1.txt ```` </details> <details> <summary>Galaxy-Workflow-Testing.ga</summary> ````json { "a_galaxy_workflow": "true", "annotation": "", "format-version": "0.1", "name": "Testing", "steps": { "0": { "annotation": "", "content_id": null, "errors": null, "id": 0, "input_connections": {}, "inputs": [ { "description": "", "name": "File to head" } ], "label": "File to head", "name": "Input dataset", "outputs": [], "position": { "left": 0.0, "top": 0.0 }, "tool_id": null, "tool_state": "{\"optional\": false, \"tag\": null}", "tool_version": null, "type": "data_input", "uuid": "6cada1bd-e96e-43d2-a41b-2f396c9ad2ca", "when": null, "workflow_outputs": [] }, "1": { "annotation": "", "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_head_tool/1.1.0", "errors": null, "id": 1, "input_connections": { "infile": { "id": 0, "output_name": "output" } }, "inputs": [ { "description": "runtime parameter for tool Select first", "name": "infile" } ], "label": null, "name": "Select first", "outputs": [ { "name": "outfile", "type": "input" } ], "position": { "left": 296.265625, "top": 57.5703125 }, "post_job_actions": {}, "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_head_tool/1.1.0", "tool_shed_repository": { "changeset_revision": "ddf54b12c295", "name": "text_processing", "owner": "bgruening", "tool_shed": "toolshed.g2.bx.psu.edu" }, "tool_state": "{\"complement\": \"\", \"count\": \"3\", \"infile\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", "tool_version": "1.1.0", "type": "tool", "uuid": "97a55f13-2d30-4ecd-a3fb-d8d433d3b3bb", "when": null, "workflow_outputs": [] } }, "tags": [], "uuid": "204f8f2e-57d9-4e2e-af72-fd2bdbc7fd0b", "version": 2 } ```` </details> #### Other interesting options for `planemo run` ```` --simultaneous_uploads When uploading files to Galaxy for tool or workflow tests or runs, upload multiple files simultaneously without waiting for the previous file upload to complete. --check_uploads_ok / --no_check_uploads_ok When uploading files to Galaxy for tool or workflow tests or runs, check that the history is in an 'ok' state before beginning tool or workflow execution. --history_name TEXT Name to give a Galaxy history, if one is created. --no_wait After invoking a job or workflow, do not wait for completion. ```` ::::info **Questions** - Is the option `--no-wait` compatible with `--download_output`? - What is the default of `--check_uploads_ok / --no_check_uploads_ok` option? :::: ### Check invocations **Warning**: Need to create a profile to check the list of invocations. ::::info **Questions** - How do you get the id of a workflow from the command line? - If you share a workflow, is the id accessible to everyone? :::: I got the id by going to the workflow view and using the id shown in the url : `https://usegalaxy.eu/published/workflow?id=906e9eaf45fa9782` ````bash! planemo list_invocations 906e9eaf45fa9782 --profile galaxyeu ```` </details> <details> <summary><b>Create profile</b></summary> ````bash! planemo profile_create galaxyeu --galaxy_url https://usegalaxy.eu/ --galaxy_user_key $EUKEY ```` </details> ## Feedback ### Suggestions #### Important for running through command line intensively 1. Return the invocation number and workflow id when runing `planemo run` from a file : allows to check on the status. 2. Being able to fetch the outputs using an invocation number : would fix connection time out problems - Linked to that : being able to check on the status of an invocation using the invocation id 2. Having the option to download either all files produced (those showed in history), or only workflow output, even better if we can download both in different folders - Note : Downloading the full history can be done by checking the invocations and getting the history URL (see suggestion 4), but it would be nice to be able get both in one command to be sure to not mix up between invocations. 4. Have an option for `planemo list_invocations` to generate a parsable file. - Would allow to flag all invocations of a workflow that are in error - Would allow to download the invocation report and history 6. For `planemo list_invocations` : include info about start and end time. Would allow to automatise checking on status regularly and flag what's new. 7. Being able to use `planemo run` with a public workflow id, a dockstore link... Something stable so that different users can be sure they use the same workflow #### Would be nice 1. job init :Include the optional inputs with the default values. I think it would reduce the risk of user errors when they need to modify the parameters 2. Have the option to download as outputs are produced, not waiting the end of the run. 3. Create the output dir if it doesn't exist, or return an error from the start saying it doesn't exit. Right now it's telling me it downloaded the files in a folder that doesn't exist. 4. Being able to create a job file from an invocation with the input datasets filed with the datasets IDs One day far away : - In the workflow building interface : provide a dirname/path for an output. Then when downloaded through planemo or the interface, it would be organized in directories. ### Issues #### Empty json output I ran planemo run with the option `--output_json output.json` Result: ````bash ╰─ more output.json {} ```` -> Only filed up if workflow has specified outputs and if the outputs are downloaded ### PArsing tool_test_output.json Loading json into a dictionnary ````python wf1json=open("results_info_WF1.json") reswf1=json.load(wf1json) ```` Get invocation details ````python reswf1["tests"][0]["data"]['invocation_details']["details"] ```` Get Genomescope outputs ````python reswf1["tests"][0]["data"]['invocation_details']['steps']["6. Genomescope"]["outputs"].keys() ```` Get model_param id ````python reswf1["tests"][0]["data"]['invocation_details']['steps']["6. Genomescope"]["outputs"]["model_params"]["id"] ````