# Serialization and External References From [External Map Entry Usage](https://hackmd.io/Xbvo1jjKSwqGLnGeL3bWZg): > > Unresolved concerns: > - A good standard for URI. For example: > - http://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301 > - doc://spdx.org/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301 > - org.spdx.document.spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301 > - Current SPDX document construction uses sections: https://spdx.github.io/spdx-spec/v3-draft/composition-of-an-SPDX-document/ > - ```json > { > "Document": {}, > "Package": {}, > "File": {}, > "Snippet": {}, > "Licensing": {}, > "Relationships": {}, > "Annotations": {} > } > ``` > - ```json > { > "Elements": [], > "Relationships": [], > "ExternalMap": [], > "Annotations": [] > } > ``` ## SPDX v3 Ontology ![](https://i.imgur.com/dA1cGED.jpg) The ontology indicates that everything (including relationships and annotations) is a specialization of **Element**. Given that an SPDX document is a collection of elements, the **Document** element (not the collection) may be serialized as: ```json= { "id": "0", "namespace": "<URI>", "specVersion": "...", "profiles": ["..."], "dataLicense": "CC0-1.0", "name": "...", "summary": "..." } ``` One attribute (namespace) is unique to the Document element. Since each SPDX document has exactly one Document element, the rootElement attribute is known to be the id of that element and does not need to be serialized separately. The next three attributes (specVersion, profiles, and dataLicense) apply to the document as a whole, and are candidates for being moved from Element to Document. The remaining attributes are common to all subtypes of Element. ## SPDX Document Each instantiated Element has a type, currently one of {Document, Identity, Artifact, Relationship, Annotation}. An SPDX document is a collection of Elements that can be serialized in multiple ways, including a flat list of individually-typed elements: ```= SpdxDocument = ArrayOf(TypedElement) TypedElement = Choice 1 document Document 2 artifact Artifact 3 identity Identity 4 relationship Relationship 5 annotation Annotation ``` or separate lists for each type of element: ```= SpdxDocument = Record 1 document Document 2 artifacts Artifact [0..*] 3 identities Identity [0..*] 4 relationships Relationship [0..*] 5 annotations Annotation [0..*] ``` These would be serialized in JSON format as: ```json= [ {"document": {}}, {"artifact": {}}, {"relationship": {}}, {"artifact": {}}, {"artifact": {}}, {"identity": {}} ] ``` or: ```json= { "document": {}, "artifacts": [{}, {}, {}] "identities": [{}, {}], "relationships": [{}, {}], "annotations": [{}] } ``` These both represent a collection of Elements. The primary distinction is how much information about an Element is contained in the IdString used to reference it: | flat list | section lists | | --------- | --------------- | | "0" | "document" | | "139" | "identities/15" | ## Internal and External Element IDs Each Element has an "id" attribute of type IdString. This string could contain text such as "SPDXRef-45" as in v2.2, but could instead be an [RFC 6901](https://datatracker.ietf.org/doc/html/rfc6901) pointer to provide flexibility in serialization formats and to allow elements to be accessed directly rather than by indexing or searching for an id. When a URI format has been decided, an external reference to an Element in another document would use the id attribute directly as the URI [fragment identifier](https://datatracker.ietf.org/doc/html/rfc8820#section-2.5): - http://spdx.org/spdxdocs/spdx-example-444504E0-4F89-41D3-9A0C-0305E82C3301#0 A base64 URI is more compact than hex and allows the URI to be a verifiable cryptographic hash of the document, as opposed to an independently-assigned UUID. Generating a hash requires canonicalizing the content to be hashed and excluding the hash value from that content. For use cases that do not require validating the document, the "hash" value can simply be a random number of the appropriate size. For flat list serialization an external reference could point to the {type: element} pair: - urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#0 - urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#37 or directly to the element: - urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#0/document - urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#37/artifact Section-based serialization uses a pointer to the element within the type: - urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#document - urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#artifacts/14 ### Element Example As an example, a Relationship element would use an unqualified idString for internal references or a namespaced idString (uri#fragment) for extenal references. There is no need for a separate ExternalMap type since an id reference can be either local or external. Relationship with Concrete Element: ```json= { "element": { "id": "72", "name": "...", "summary": "...", "description": "...", "annotations": ["63", "91"] } "from": "15", "to": ["urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#46"], "type": "DEPENDS_ON", "completeness": "INCOMPLETE" } ``` Relationship with Abstract Element: ```json= { "id": "72" "name": "...", "summary": "...", "description": "...", "annotations": ["63", "91"], "from": "15", "to": ["urn:sha256:bOGhYYoUzwbyxTu72a4rLWPPEPPGywP3IFpX9jhc96s#46"], "type": "DEPENDS_ON", "completeness": "INCOMPLETE" } ``` ## Serialization Alternatives 1. Should the top-level SPDX Document be: a. single list of individual subtypes b. separate lists for each subtype c. something else 2. Should Element instantiation be concrete or abstract? a. concrete means each subtype contains a distinct "element" property b. abstract means the properties of Element are mixed in with the properties of each subtype, which could potentially result in collisions