owned this note
owned this note
Published
Linked with GitHub
Starling UA Archive Data Model
==============================
## Configurations
Organizations are configured as such in **config.json**:
```
{
"organizations": [
{
"id": "starling-lab",
"collections": [
{
"id": "starling-capture-bosnia",
"asset_extensions": [
"jpg",
"jpeg"
],
"actions": [
{
"name": "c2pa-create",
"params":
{
"signer": "starlinglab-c2pa"
}
}
]
},
{
"id": "web-archives-personal",
"asset_extensions": [
"wacz"
],
"actions": [
{
"name": "archive",
"params":
{
"authsigner": "starlinglab-authsign",
"encryption":
{
"algo": "aes-256-cbc",
"key": "starlinglab-aes-256"
},
"registration_policies":
{
"opentimestamps":
{
"active": true
},
"iscn":
{
"active": true
},
"numbersprotocol":
{
"active": true
}
}
}
}
]
}
]
}
]
}
```
Collection IDs have same rules as Organization IDs. Examples of collection IDs:
- web-archives-witness-halasystems
- tips-archives-signal-halasystems
- chat-archives-slack-halasystems-workspace0
- chat-archives-telegram-halasystems-bot0
- chat-archives-telegram-syrianarchive-bot0
## Access control
### API with JWTs
One way to allow access and self-identification is through JSON web tokens. Multiple JWTs can be issued per Organization and Collection to grant write permission. Author information, that may become part of Content Metadata, can also be bundled in the JWT:
```
{
"organization_id": "starling-lab",
"collection_id": "create",
"author": {
"type": "Person",
"identifier": "https://hypha.coop",
"name": "Benedict Lau"
},
"twitter": {
"type": "Organization",
"identifier": "https://hypha.coop",
"name": "HyphaCoop"
},
"copyright": "Copyright (C) 2021 Hypha Worker Co-operative. All Rights Reserved."
}
```
This allows HTTP APIs to receive asset and metadata from a HTTP POST. Other methods such as Dropbox Sync and Local Services (dockers) are also supported.
### Local services
Based on configuration in **config.json**, a folder is created for each Collection ID in its Organization at `./internal/organization_id/collection_id`. For example:
- `./internal/starling-lab/starling-capture-bosnia`
- `./internal/starling-lab/web-archives-personal`
Assets and their associated metadata are dropped into `./input` within these folders with the following requirements:
1. One ZIP file per asset, ending in `.zip`.
2. The ZIP file must be dropped atomically (i.e. it must not spend a while being uploaded, copied, or generated). The client may choose to utilize a temporary folder to construct the ZIP, then rename it to the path, or use a temporary extension while building the ZIP in the folder then rename it to the .zip extension.
3. The ZIP contains three files, named as such:
- `sha256(asset).ext`: the asset file with `ext` matching one of the Collection's `asset_extensions`
- `sha256(asset)-meta-content.json`: the metadata associated with the asset file
- `sha256(asset)-meta-recorder.json`: the metadata associated with the recorder of the asset
4. The name of the ZIP file should be `sha256(zip)`. (However, the Action does not need to verify this.)
These input folders (e.g. `./internal/starling-lab/starling-capture-bosnia/input`) are watched with a pattern match on `*.zip` and picked up by the Actions associated with the folder in **config.json** for further processing.
## Archive action
1. Copy ZIP file to a temporary directory for processing (i.e. unzip, append to zip, etc.)
2. Verify asset file, reject otherwise:
- name matches `sha256(asset)`
- extension is contained in `asset_extensions` for Collection
3. Generate [ CIDv1, SHA256, MD5 ] hashes for the asset file
4. Register on opentimestamp and download proof file into `$ZIP_ROOT/proofs/`
5. Take final archive ZIP:
- generate [ CIDv1, SHA256, MD5 ] hashes
- store as `./internal/organization_id/collection_id/action-archive/sha256(hash).zip`
5. Encrypt final archive ZIP with `encryption` params, take the encrypted output:
- generate [ CIDv1, SHA256, MD5 ] hashes
- store as `./internal/organization_id/collection_id/action-archive/sha256(hash).encrypted`
6. (TBD) Generate registration record JSON for Numbers Protocol:
- generate JSON with `numbersprotocol` params
- sign JSON with `authsigner` params
- store as `./internal/organization_id/collection_id/action-archive/sha256(hash)-numbersprotocol-record.json`
- register with Numbers Protocol module
6. (TBD) Generate registration record JSON for ISCN:
- generate JSON with `iscn` params
- sign JSON with `authsigner` params
- store as `./internal/organization_id/collection_id/action-archive/sha256(hash)-iscn-record.json`
- register with Numbers Protocol module
### ISCN
```
{
"recordNotes": "Initial registration.",
"contentFingerprints": [
"hash://sha256/9564b85669d5e96ac969dd0161b8475bbced9e5999c6ec598da718a3045d6f2e",
"ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi"
],
"contentMetadata": {
"@context": "https://schema.org",
"@type": "CreativeWork",
"name": "<organization_id>_<collection_id>_<date>",
"description": "Encrypted archive of <organization_id>_<collection_id> on <date>",
"datePublished": "<date>",
"version": 1,
"author": "https://starlinglab.org",
"keywords": "<organization_id>,<collection_id>"
"conditionsOfAccess": "Encrypted with AES-128."
}
}
```