# GitBOM
There are several existing SBOM approaches already in existence. This proposal does not seek to replace any of them.
It is also agnostic amongst all of them.
This proposal fundamentally focuses on allowing tracking the tree of artifacts inherent to an SBOM through the entire chain and allowing them to be associated with metadata. From its perspective, GitBOM treats all of the other SBOM approaches as metadata.
# SBOM is an artifact tree + metadata
A Software Bill of Materials (SBOM) of an artifact is fundamentally a tree of artifacts their associated metadata.
## Artifact Tree Examples:
### C/C++



### Go

### Java

### Python

### Generic artifact tree

# Learning from Git
Git specifies a simple generalizable 'object' format consisting of
```${objectype} ${size of blob in bytes written in characters}${nul character '\0'}${data}```
Git allows us to assign an **identifier** (git ref) to any object by taking its sha1 sum over the object.
This proposal is to specify the fundamental objects in the BOM in git object format.
## Every artifact is a 'blob'
Git defines its most fundamental object as a 'blob,' which is simply an array of bytes. Git repositories store every file in a git repo as a 'blob.' Because a 'blob' is a git object, every 'blob' has an identifier (git ref) that is the same no matter what repo it is in or where in the repo it is.
Since every artifact is an array of bytes, every artifact is a 'blob.'
## A GitBOM is an artifact tree
A GitBOM is a simple document describing the immediate children of an artifact.
It consists of a series of lines separated by a newline character ('\n')
```
blob ${identifier1} bom ${identifier2}
```
Where ```${identifier1}``` is the identifier of an child artifact of the subject of the BOM
and ```${identifier2}``` is the identifier of the BOM for that child artifact. The lines are sorted in lexical order.
If the blob is a leaf in an artifact tree (i.e., it has no children), then ```bom``` is omitted, and the line becomes
```
blob ${identifier1}
```
Since a GitBOM is an array of bytes, it is a 'blob' and can be assigned an immutable identifier (git ref). The references in the GitBOM are all to identifiers (git refs) for either child artifacts or the GitBOM of child artifacts. Given the identifier of a GitBOM associated with an artifact, the entire subtree under that artifact can be validated.
Each GitBOM only supplies information about an artifact's immediate child artifacts and their GitBOMs. Supplying all the GitBOMs for a tree allows full verifiable identification of all artifacts in that tree and the tree's structure down to source files. Partial disclosure of the GitBOMs for a tree allows partial disclosure of the tree without compromising the verifiability of the portion of the tree provided.
# Embedding GitBOM identifiers in artifacts
### ELF Files (Executables and .so, and .o files)
Embed identifier of the GitBOM into an elf section named '.bom'
### ar Files (.a static libraries)
Embed identifier of the GitBOM into an archive entry named '.bom'
### General Archive files (tar,gzip,etc)
Embed identifier of the GitBOM into an archive entry named '.bom'
### Java class file
Embed identifier of the GitBOM into a Java annotation named @BOM in the .class file.
### Python .pyc files
Embed identifier of the GitBOM into into an ```__bom__``` in the .pyc file.
# Toolchain integration
## Compiler/Linker Integration
Compilers/Linkers could easily be augmented to include the identifier (git ref) of the GitBOM for each generated artifact (elf files, ars, class files). And output the GitBOM for all generated artifacts. Building GitBOM identifiers into generated artifacts mean that each artifact can be mapped back to its GitBOM, and the artifact tree constructed down to the source files.
Note: llvm, gcc, javac all have plugin mechanisms that could be used to POC compiler/linker integration out without forking those compilers. Go build's -toolexec flag could be utilized to likewise POC with Go.
### Special consideration for languages with #include
C/C++ compilers such as gcc and llvm preprocess C files to '#include' their .h files.
As part of this, they [routinely garner metadata about those files](https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html). The preprocessors and compiler can easily be augmented to automatically add the '#include'ed artifacts into the GitBOM for the .o files.
# Benefits of Approach
## Git
Git is ubiquitous, has an enormous amount of support across many many tools. Most languages already have libraries that support working with git objects.
The git CLI tool even provides assistance. For example::
```
git hash-object ${filename}
```
will output the identifier (git ref) for ```${filename}```.
Most code in the modern era is stored in git. By identifying source code artifacts by their git ref, it becomes easily possible to correlate 'leaf' artifacts with their actual source code, including searching over large stores of git data such as GitHub.
## Seperation of artifact tree from metadata
The GitBOM itself intentionally contains no metadata. This is because an artifact is defined by the child artifacts that went into its creation. An excutable is not different because it has different contacts associated with it. A source file is not different because it resides in a different git repo. Any needed metadata reference the artifact or GitBOM by it identifier (git ref).
## GitBOMs can simply be part of every build
Making GitBOMs part of every build ensures their ubiquity, and makes every artifact identifiable.
## Extensibility of Metadata
GitBOMs only make the artifact tree identifiable and verifiable. The artifact tree can be used by multiple types of metadata from SBOMs to attestations to vulnerability disclosures.
## Compatibility with other SBOM approaches
There are a number of existing SBOM approaches that produce SBOMs. Any or all of them could be treated as metadata utilizing GitBOM for identification and verification of the artifact tree.of their own type and added to the Object Archive either directly or via a 'location' entry.
## Traceability of Vulnerabilities
GitBOMs allow traceability from source to artifact (library, executable, running executable). A CVE could record the git ref of source files from which a vulnerability originates, and that would allow tracing exposure to the vulnerability throughout the supply chain.
Providers of artifacts could declare that either their artifact (library, executable, running executable) that contains a source file in its artifact tree does not express a vulnerability, requires a mitigation of a vulnerability, etc. This could also be associated with the CVE.
The net result being that if a consumer has the executable, they can identify the GitBOM. Given the GitBOMs, the consumer can determine whether the executable might be vulnerable, has a declaration of not being vulnerable, has a mitigation, etc.
## Extensibility to larger systems
As noted, a running executable may have shared libraries, not just executable files in its tree. This points to the fact that you can utilize GitBOMs to construct artifact trees of running systems.
Another good example might be the GitBOM for a running Kubernetes Pod. It would have as its children the containers in the Pod, the Pod spec, the ConfigMaps etc.
Further, information about hardware could be incorporated into the GitBOM by representation of that hardware, its firmware, etc... all as artifacts in the tree.
## Utility for attestation of identity
Tools like [Spire](https://spiffe.io/docs/latest/spire-about/spire-concepts/) that attest running systems to provide [Spiffe identities](https://spiffe.io/) could attest based upon the GitBOM artifact tree and other associated metadata. The extensibility to larger systems is critical to doing this meaningfully.
## Utility in forensics
GitBOMs could be utilized forensically because they are referenced in the artifacts themselves to trace known exploits of a system back to the artifacts from which they originate.
# Other References
-[Ideas on Storing and Publishing GitBoms](https://hackmd.io/ExsL_iGvSp6pbFmFSxr4Vg?view)