# Query "chain-of-repos provenances" ## Example use case within omnibenchmark [This method project](https://renkulab.io/projects/omnibenchmark/omni_batch/mnn-omni-batch) runs a method called *mnn* as part of omnibenchmark. As input it uses preprocessed (e.g., normalized counts) files, that are imported as renku dataset **omni_batch_processed** and generated in [this preprocessing project](https://renkulab.io/projects/omnibenchmark/omni_data/omni-batch-processed). Preprocessing is done in a standardized way on all renku datasets with the keyword *omni_batch* (e.g, **cellbench** and **csf_patient** datasets, which are generated [here](https://renkulab.io/projects/omnibenchmark/omni_data/cellbench) and [here](https://renkulab.io/projects/omnibenchmark/omni_data/csf-patients)). Besides the processed counts files **omni_batch_processed** dataset contains meta data files for each of the original datasets: ``` bash renku dataset ls-files omni_batch_processed omni_batch_processed 2021-06-04 12:59:51 175 KB data/omni_batch_processed/meta_csf_patient.json omni_batch_processed 2021-06-04 12:59:51 102 KB data/omni_batch_processed/meta_cellbench.json omni_batch_processed 2021-06-04 12:59:51 44 MB data/omni_batch_processed/norm_counts_cellbench.mtx.gz omni_batch_processed 2021-06-04 12:59:51 9.2 MB data/omni_batch_processed/norm_counts_csf_patient.mtx.gz ``` In the method project we want to run the method *mnn* on all `norm_counts_*.mtx.gz` and their corresponding `meta_*.json` by generating one renku workflow per original dataset (so one workflow per `norm_counts_*.mtx.gz`). In the moment we find the corresponding meta file by matching names between `meta_*.json` and `norm_counts_*.mtx.gz`. As we can not be sure that our user comply with our naming scheme, a cross-repository query to identify the corresponding original dataset and so `meta_*.json` for each`norm_counts_*.mtx.gz` would be more robust. This is even more relevant for final output files of omnibenchmark, when we would like to track back which dataset and method a result file originates from. ### Example query From within [mnn_omni_batch](https://renkulab.io/projects/omnibenchmark/omni_batch/mnn-omni-batch) find the **dataset id** of the dataset, that was used as input to generate `data/omni_batch_processed/norm_counts_cellbench.mtx.gz`. Query should give the **dataset_id** of the *cellbench* dataset. ### Scheme ![](https://i.imgur.com/J2c4wv4.png)