# Collection Intro
'Collection' is a term coined for Belle II specific case. Collection is a single path containing datasets of ineterest.
The inital idea started with building a single path that contains the whole set of datasets intented for conferences, which is defined and fixed by the DP team. Since then, the idea has propagated to futher use-cases where collections can be useful.
# Types of Collection
There are 4 types of collection that can be created until now. These type of collection is defined in configuration system (CS): `Operations/Deafults/DatasetCollection`

The types of collection are as follows:
1. **general**:
This collection starts with `/belle/collection/general`. It is supposed to be defined for general analysis purposes.
* It must **start with `/belle/collection/general`**
2. **BG**:
This collection is for defining a single path containing background overlay datasets. The purpose of this to help with data production workflow.
The requirement of BG collection:
* **It must start with `/belle/collection/BG`**
3. **hRaw**:
This collection is for making single LPN containing hRaw datasets (may be combined with datasets /belle/Data).
The requirement of hRaw collection:
* **It must start with `/belle/collection/hRaw`**
4. **test**
This collection is for making test collection. It is for maaking the collection before publishing. So that it can be tested and validated.
* **It must start with `/belle/collection/test`**
To use following tool. You need to use BelleDIRAC from Certification server. Link to use BelleDIRAC for certification : [Cert@BNL](https://confluence.desy.de/display/BI/Getting+A+BelleDIRAC+Client+Installed+For+The+BNL+Validation+Server)
# Collection's tools guide
## Collection Management Tools: gb2_ds_collection
**Permission required: belle_dataprod**
This command is single for collection management create/update/delete. It has following 3 sub-namespace fo all subcategories of management.

### gb2_ds_collection create
This tool is used to create all types of collection.

1. **For general collection
Use `--input_ds_search` option . This option will query dataset-searcher and retrive a dataset to be put into a collection.**
You can use all the option of metadata attribute to search for dataset from dataset searcher as:
```
$ gb2_ds_collection create /belle/collection/general/<collection_name>
--input_ds_search 'attr1=value;attr2=value'
--description 'some short description'
--int_lum <integrated luminosity in /fb>
```
-> Each attribute=value are **;** seperate. Each attributed are defined below.
* data_type=
* campaign=
* beam_energy=
* data_level=
* general_skim=
* release=
* bkg_level=
* mc_event=
* skim_decay=
* global_tag=
* run=low:high
* exp=low:high
Among these: **campaign, data_type,data_level,general_skim are required**. [skim_decay is required if data_level=udst]
Multi-campaigns dataset for single collection is allowed by comma(,) separated value in campaign as:
**`campaign=proc12,bucket16,bucket17`**
You can use `--dryrun` option with `-o <outputfile_name>` to get the datasets and check before making actual collection.
2. **For BG/hRaw collection**:
This two types of collection can be created using `-i` option as:
```
$ gb2_ds_collection create /belle/collection/BG/<collection_name> -i <input_file> --description 'some short description'
```
```
$ gb2_ds_collection create /belle/collection/hRaw/<collection_name> -i <input_files> --description 'some short description'
```
where: <input_file>.txt : file containing LPNs of datasets in each line.
These LPNs of dataset should have following resctriction:
a. Each LPN must start with what described under **contentLPNs** of each section defined in CS.
b. All LPNs must have same value for metadat attributes defined under **MetadataAttributes** of each seaction defined in CS.
3. **For test collection**:
This two types of collection can be created using `-i` option as:
```
$ gb2_ds_collection create /belle/collection/test/<collection_name> -i <input_file> --description 'some short description' --int_lum
```
Test collection can be published to other type of collection general/hRaw/BG by using `gb2_ds_collection publish` as shown below.
### gb2_ds_collection publish
Once the test collection is made. It can be published to other type of collection.

```
$ gb2_ds_collection /belle/collection/general/<published_collection_name> --source /belle/collection/test/<name>
$ gb2_ds_collection /belle/collection/hRaw/<published_collection_name> --source /belle/collection/test/<name>
```
### gb2_ds_collection update
Once created, content of collection i.e. dataset LPNs should NOT be edited. This is done for making consistency. Thus for update you can only update the 'descripion' and/or 'int_lum' part.

```
$ gb2_ds_collection update <collection_path>
--int_lum 200
```
```
$ gb2_ds_collection update <collection_path>
--description 'updated description of collection'
```
### gb2_ds_collection delete
This command is use to delete any collection.

```
$ gb2_ds_collection delete <collection_path>
```
## Collection Search tools: `gb2_ds_search collection`
`gb2_ds_search` command is in use to search metadata and dataset in dataset-searcher. This tools is extended to search for info related to collection. New sub-namespace is added.
**gb2_ds_search collection**

This tool has three options.
a. **`--list_all_collection `**
List all available collections.
```
$ gb2_ds_search collection --list_all_collection /belle/collection/*
$ gb2_ds_search collection --list_all_collection /belle/collection/general/*
$ gb2_ds_search collection --list_all_collection /belle/collection/hRaw/*
$ gb2_ds_search collection --list_all_collection /belle/collection/BG/*
```
b. **--get_metadata**
Gives metadata info of a collection
```
$ gb2_ds_search collection --get_metadata <collection_name>
```
c. **--list_datasets**
Gives LPN of all dataset in collection
```
$ gb2_ds_search collection --list_datasets <collection_name>
```
## gb2_ds_list
Collection is compatable with gb2_ds_list.
You can use collection path as you use any other path.
```
$ gb2_ds_list <collection>
$ gb2_ds_list -lg <collection>
```
## gbasf2 with collection
You can use collection as you use other LPN/LFN while submitting gbasf2 jobs.
**Multi collections in a single project is not permitted**
```
$ gbasf2 <my_script>.py -i <collection> ...
```
**For cross campaign LPNs, the gbafs2 option -n <number _of_files in_a_job> will result in globaltag failure.**