--- title: Indentifier Generation tags: identifier, minting, isamples --- # Identifier Generation Assignment of identifiers to physical samples collected in the field. ## General Pattern A general pattern for field work is something like the following sequence. ```plantuml @startuml actor "Field\nResearcher" as field entity "Local\nIdentifier\nAuthority" as auth1 participant "The Natural\nEnvironment" as env database "Sample\nCollection" as coll entity "Global\nIdentifier\nAuthority" as auth2 participant iSamples activate field field --> env: collect sample env --> field: unidentified sample field --> auth1: get identifier auth1 --> field: id-01 field --> field: Create record\nwith id-01 deactivate field ... Back to office ... field --> coll: store sample id-01 activate coll coll --> auth2: get identifier auth2 --> coll: id-02 coll --> coll: preserveRecord(id-02) coll --> field: ok deactivate coll iSamples --> coll: Harvest Records field --> iSamples: get id-01 iSamples --> field: Huh? @enduml ``` Samples are collected, documented, and assigned a "field identifier" which comes from some local authority (e.g. a sequential list). The samples are eventually accessioned to a collection where the collection management system assigns a "real" identifier. The researcher then tries to find their records using their "field identifier", which fails because it is not recognized. ## Preserving local identifiers ```plantuml @startuml actor "Field\nResearcher" as field entity "Local\nIdentifier\nAuthority" as auth1 participant "The Natural\nEnvironment" as env database "Sample\nCollection" as coll entity "Global\nIdentifier\nAuthority" as auth2 participant iSamples activate field field --> env: collect sample env --> field: unidentified sample field --> auth1: get identifier auth1 --> field: id-01 field --> field: Create record\nwith id-01 deactivate field ... Back to office ... field --> coll: store sample id-01 activate coll coll --> auth2: get identifier auth2 --> coll: id-02 coll --> coll: preserveRecord(id-02, alt=id-01) coll --> field: ok deactivate coll iSamples --> coll: Harvest Records field --> iSamples: get id-01 iSamples --> iSamples: search any\nidentifier = id-01 iSamples --> field: OK, but there's lots of results @enduml ``` Preserving the original "Field Identifier" in the collection and ensuring records are discoverable with any assigned identifiers improves recall at the sacrifice of precision, since there may be many Field Identifiers with the same value. This approach is [used by GEOME](https://fims.readthedocs.io/en/latest/fims/identifiers.html). GEOME example: Field id = `CMU_161` ARK root = `ark:/21547/mx2` Full identifier = `ark:/21547/mx2CMU_161` ## Delegation to local authority ```plantuml @startuml actor "Field\nResearcher" as field entity "Local\nIdentifier\nAuthority" as auth1 participant "The Natural\nEnvironment" as env database "Sample\nCollection" as coll entity "Global\nIdentifier\nAuthority" as auth2 participant iSamples field --> auth2: get identifier delegates activate field auth2 --> field: identifiers[] field --> auth1: identifiers[] auth1 --> field: Ready deactivate field ... Start field work ... field --> env: collect sample activate field env --> field: unidentified sample field --> auth1: get identifier auth1 --> field: id-01 field --> field: Create record\nwith id-01 deactivate field ... Back to office ... field --> coll: store sample id-01 activate coll coll --> coll: Existing identifier is good coll --> coll: preserveRecord(id-01) coll --> field: ok deactivate coll iSamples --> coll: Harvest Records field --> iSamples: get id-01 iSamples --> field: OK! @enduml ``` One option is to generate a global identifier early and maintain its use for the life of the object. Support for this requires changes to the sample collection process to use the (possibly verbose) pre-generated identifier and in the collection management system to recognize and use the existing identifier. A variation of this pattern would provide a prefix to which field generated identifiers could be appended. The shorter field generated ids could be used for identifiying the samples, and expanded to the full form when loaded to the collection.