# Collection Intro 'Collection' is a term coined for Belle II specific case. Collection is a single path containing datasets of ineterest. The inital idea started with building a single path that contains the whole set of datasets intented for conferences, which is defined and fixed by the DP team. Since then, the idea has propagated to futher use-cases where collections can be useful. # Types of Collection There are 4 types of collection that can be created until now. These type of collection is defined in configuration system (CS): `Operations/Deafults/DatasetCollection` ![](https://i.imgur.com/hLt0H2L.png) The types of collection are as follows: 1. **general**: This collection starts with `/belle/collection/general`. It is supposed to be defined for general analysis purposes. * It must **start with `/belle/collection/general`** 2. **BG**: This collection is for defining a single path containing background overlay datasets. The purpose of this to help with data production workflow. The requirement of BG collection: * **It must start with `/belle/collection/BG`** 3. **hRaw**: This collection is for making single LPN containing hRaw datasets (may be combined with datasets /belle/Data). The requirement of hRaw collection: * **It must start with `/belle/collection/hRaw`** 4. **test** This collection is for making test collection. It is for maaking the collection before publishing. So that it can be tested and validated. * **It must start with `/belle/collection/test`** To use following tool. You need to use BelleDIRAC from Certification server. Link to use BelleDIRAC for certification : [Cert@BNL](https://confluence.desy.de/display/BI/Getting+A+BelleDIRAC+Client+Installed+For+The+BNL+Validation+Server) # Collection's tools guide ## Collection Management Tools: gb2_ds_collection **Permission required: belle_dataprod** This command is single for collection management create/update/delete. It has following 3 sub-namespace fo all subcategories of management. ![](https://i.imgur.com/xTZoGs7.png) ### gb2_ds_collection create This tool is used to create all types of collection. ![](https://i.imgur.com/PwEcHdC.png) 1. **For general collection Use `--input_ds_search` option . This option will query dataset-searcher and retrive a dataset to be put into a collection.** You can use all the option of metadata attribute to search for dataset from dataset searcher as: ``` $ gb2_ds_collection create /belle/collection/general/<collection_name> --input_ds_search 'attr1=value;attr2=value' --description 'some short description' --int_lum <integrated luminosity in /fb> ``` -> Each attribute=value are **;** seperate. Each attributed are defined below. * data_type= * campaign= * beam_energy= * data_level= * general_skim= * release= * bkg_level= * mc_event= * skim_decay= * global_tag= * run=low:high * exp=low:high Among these: **campaign, data_type,data_level,general_skim are required**. [skim_decay is required if data_level=udst] Multi-campaigns dataset for single collection is allowed by comma(,) separated value in campaign as: **`campaign=proc12,bucket16,bucket17`** You can use `--dryrun` option with `-o <outputfile_name>` to get the datasets and check before making actual collection. 2. **For BG/hRaw collection**: This two types of collection can be created using `-i` option as: ``` $ gb2_ds_collection create /belle/collection/BG/<collection_name> -i <input_file> --description 'some short description' ``` ``` $ gb2_ds_collection create /belle/collection/hRaw/<collection_name> -i <input_files> --description 'some short description' ``` where: <input_file>.txt : file containing LPNs of datasets in each line. These LPNs of dataset should have following resctriction: a. Each LPN must start with what described under **contentLPNs** of each section defined in CS. b. All LPNs must have same value for metadat attributes defined under **MetadataAttributes** of each seaction defined in CS. 3. **For test collection**: This two types of collection can be created using `-i` option as: ``` $ gb2_ds_collection create /belle/collection/test/<collection_name> -i <input_file> --description 'some short description' --int_lum ``` Test collection can be published to other type of collection general/hRaw/BG by using `gb2_ds_collection publish` as shown below. ### gb2_ds_collection publish Once the test collection is made. It can be published to other type of collection. ![](https://i.imgur.com/A6Qt7at.png) ``` $ gb2_ds_collection /belle/collection/general/<published_collection_name> --source /belle/collection/test/<name> $ gb2_ds_collection /belle/collection/hRaw/<published_collection_name> --source /belle/collection/test/<name> ``` ### gb2_ds_collection update Once created, content of collection i.e. dataset LPNs should NOT be edited. This is done for making consistency. Thus for update you can only update the 'descripion' and/or 'int_lum' part. ![](https://i.imgur.com/1jDety5.png) ``` $ gb2_ds_collection update <collection_path> --int_lum 200 ``` ``` $ gb2_ds_collection update <collection_path> --description 'updated description of collection' ``` ### gb2_ds_collection delete This command is use to delete any collection. ![](https://i.imgur.com/9bGLuEV.png) ``` $ gb2_ds_collection delete <collection_path> ``` ## Collection Search tools: `gb2_ds_search collection` `gb2_ds_search` command is in use to search metadata and dataset in dataset-searcher. This tools is extended to search for info related to collection. New sub-namespace is added. **gb2_ds_search collection** ![](https://i.imgur.com/sVCfIjS.png) This tool has three options. a. **`--list_all_collection `** List all available collections. ``` $ gb2_ds_search collection --list_all_collection /belle/collection/* $ gb2_ds_search collection --list_all_collection /belle/collection/general/* $ gb2_ds_search collection --list_all_collection /belle/collection/hRaw/* $ gb2_ds_search collection --list_all_collection /belle/collection/BG/* ``` b. **--get_metadata** Gives metadata info of a collection ``` $ gb2_ds_search collection --get_metadata <collection_name> ``` c. **--list_datasets** Gives LPN of all dataset in collection ``` $ gb2_ds_search collection --list_datasets <collection_name> ``` ## gb2_ds_list Collection is compatable with gb2_ds_list. You can use collection path as you use any other path. ``` $ gb2_ds_list <collection> $ gb2_ds_list -lg <collection> ``` ## gbasf2 with collection You can use collection as you use other LPN/LFN while submitting gbasf2 jobs. **Multi collections in a single project is not permitted** ``` $ gbasf2 <my_script>.py -i <collection> ... ``` **For cross campaign LPNs, the gbafs2 option -n <number _of_files in_a_job> will result in globaltag failure.**