# MetaLad Hackathon
## Computing playground:
(For those with a ``juseles``account)
```
ssh -J <user>@juseless.inm7.de <user>@lad1
cd /playground
```
## Initial demo by Christian (9.30AM-11AM CET)
```
# create a nested dataset structure
datalad create ds1 && cd ds1
echo '"3456"' > x.txt
echo '"xxxxxx"' > f1.txt
datalad create -d . sub1
echo '"1234567"' > sub1/x.txt
datalad create -d . sub2
echo '"1234567"' > sub2/sub2_f1.txt
datalad save -r
# extract dataset metadata
datalad meta-extract -d . metalad_core | jq
# try dump metadata, returns "NoMetadataStoreFound"
datalad meta-dump .
# add metadata by piping
datalad meta-extract -d . metalad_core | \
datalad meta-add -
# add more metadata by another extractor
datalad meta-extract -d . metalad_example_dataset | \
datalad meta-add -
datalad meta-dump -d .
# repeat metadata extraction with
# metalad_example_dataset which changes the
# metadata as it contains a timestamp
datalad meta-extract -d . metalad_example_dataset | \
datalad meta-add -
# add single file metadata
datalad meta-extract -d . metalad_core x.txt | datalad meta-add -
# there are no stores in subdatasets
cd sub1
datalad meta-dump .
# add metadata to subdataset
datalad meta-extract -d . metalad_core x.txt | datalad meta-add -
# dump the file level metadata. THIS REQUIRES -r! (otherwise it would return dataset metadata, of which the subds has none)
datalad meta-dump . -r
# or datalad meta-dump '.:x.txt'
# aggregate subds metadata into the super
cd ..
datalad meta-aggregate sub1
datalad meta-dump -r
# one can automate the subds extraction and superds aggregation with meta-conduct
datalad meta-conduct extract_metadata --pipeline-help
# save current metadata
datalad meta-dump -r > before.json
# removing metadata (would also need a git gc)
rm -rf .git/refs/datalad
datalad meta-conduct extract_metadata traverser.top_level_dir=$(pwd) traverser.traverse_sub_datasets=True traverser.item_type=file extractor.extractor_type=file extractor.extractor_name=metalad_core adder.aggregate=True
```
# Discussions
### Topic 1: Metadata has no history - performance discussion
Metadata is stored by the following chain of properties:
``dataset-id``, ``dataset-version``, ``extractor-name``, ``extractor-versio``
If metadata is added with the exact same dataset-id, dataset-version, extractor-name, and extractor-version, the new metadata overwrites the old one.
Stephan asks whether it wouldn't be more performant to not perform meta-add when the metadata already exists (he has a usecases about the institute archive with lots of file-level metadata where meta-conduct takes days), Christian cautions that the look-up process would be a larger performance hit and suspects that the slow behavior is due to slow executor performance.
### Topic 2: the --recursive flag
- better name?
- return file level metadata under which circumstance?
-
## TODO
- Add to docs:
- suggestions for improving performance when handling large amounts of metadata
- Explain the `-r` switch and dump patterns extremely well
- `meta-conduct`
-