# alma-sbom Modularization and Refactoring
## Abstract of This Document
The current alma-sbom has become bloated with ad-hoc patches being applied to alma-sbom.py, except for the libsbom directory. Additionally, even within the libsbom directory, there is insufficient class/package separation, making source code management very difficult.
These conditions are having a particularly negative impact on my own work when submitting patches, and this is an issue I want to resolve.
Furthermore, since alma-sbom has several issues with its Python packaging and project structure, I want to properly address these problems this time.
PR: https://github.com/AlmaLinux/alma-sbom/pull/49
## Draft Design Document
### 1. Abstract
alma-sbom is a Python package that generates SBOMs (Software Bill of Materials) based on package and build information.
**Note:** This document has been prepared for the purpose of this refactoring. It is not intended to replace existing documentation.
#### 1.1 Features
- Package SBOM: Generates an SBOM for a single package
- Build (Job) SBOM: Generates an SBOM containing all packages related to a build by specifying the build job ID
Additionally, the following features are provided utilizing the existing git_notarize.py:
- AlmaLinux Git Notarization Tool: This utility allows to manually notarize AlmaLinux git sources using the ImmudbWrapper. (https://github.com/AlmaLinux/alma-sbom?tab=readme-ov-file#almalinux-os-sbom-data-management-utilities)
#### 1.2 Supported Formats
- SPDX
- CycloneDX
#### 1.3 Data Sources
- immudb: Primary source for package and build information
- RPM package: Supplementary source for package information
### 2. Architecture
#### 2.1 Overall Structure
Note: This is that refer to structure overview. So, note that it doesn't have all file.
```
alma-sbom/
├── alma_sbom/
│ ├── _version.py # Version info
│ ├── constants.py # Constant Definitions
│ ├── type.py # Type Definitions
│ ├── data/ # Data Acquisition and Processing
│ │ ├── models/ # Base data models
│ │ │ ├── package.py # Package data model
│ │ │ ├── build.py # Build data model
│ │ ├── attributes/ # Attributes
│ │ │ ├── property.py # Package property
│ │ ├── collectors/ # Data Collection
│ │ │ ├── immudb/ # from immudb
│ │ │ ├── albs.py # from albs
│ │ └ └── rpm.py # from rpm package
│ ├── formats/ # Implementation for Each SBOM Format
│ │ ├── document.py # Base Document class Definition
│ │ ├── spdx/
│ │ │ └── document.py # SPDX Document class Definition
│ │ ├── cyclonedx/
│ │ └ └── document.py # Cyclonedx Document class Definition
│ ├── cli/ # CommandLine Interface
│ │ ├── main.py # Define Main class and cli_main
│ │ ├── logging.py # cli logging setting class
│ │ ├── config/ # cli config(options)
│ │ │ ├── config.py #
│ │ │ ├── commands/ # cli config(options) for subcommands
│ │ │ │ └── package.py # options for package subcommands
│ │ │ └ └── build.py # options for build subcommands
│ │ ├── commands/
│ │ │ ├── package.py # package command
│ │ │ └── build.py # build command
│ │ ├── factory/
│ │ │ ├── documents # factory to generate Collector instance defined under data layer
│ │ │ └── collector # factory to generate Document instance define under formats layer
```
#### 2.2 Responsibilities of Each Layer
##### 2.2.1 Data Layer
Responsible for retrieving and processing information from external data sources.
**models**
Defines the base data models.
- package.py: Package Data Model
- build.py: Build Data Model
- utils.py: Common Model Utilities
- merge.py: Data Merge Function (This can be implemented as class function?)
**Collectors**
Responsible for collecting data.
- immudb/: from immudb
- albs.py: from ALBS
- rpm.py: from rpm package
**attributes**
Define attributes data models using in models
Now, we only have property.py that is define properties used in package class.
If you want to add data addtionally, use this layer to define additional data.
##### 2.2.2 Formats Layer
Provides implementations specific to each SBOM format.
document.py located top layer define abstruct Document class.
All format layer (now there is SPDX/CycloneDX layer) has each documents.py to define each Format-Specific Document class using abstruct Document class.
##### 2.2.3 CLI Layer
Provides the command line interface.
There are below files in top layer.
- main.py:
Define main entry point(cli_main) and Main class
- logging.py:
Define Logging class to setup log setting
Define logging related-options
There are 3 sub layer.
**config**
config layer has responsibility of configulation in cli layer
- config.py: Define CommonConfig class having Common configuration items not dependent on subcommands (except logging setting)
- commands/:
- package.py: Define PackageConfig class having package subcommand configurations
- build.py: Define PackageConfig class having build subcommand configurations
**commands**
commands layer has Implementation of Each Subcommand(now, package and build)
- commands.py: Define abstruct subcommand class
- package.py: Define package subcommand class
- build.py: Define build subCommand class
**factory**
factory layer has factory class to generate instance like library interface
- documents/: Define DocumentFactory class to generate Document instance define in formats layer (like interface of document layer)
- collector/: Define CollectorFactory class to generate Collector instance define in data layer (like interface of data layer)
### 3. Data Flow
1. CLI Layer receives commands
2. Get Collector instance(data layer API) via CollectorFactory, And get SBOM data using that.
3. Get Document instance(format layer API) via DocumentFactory and data obtained earlyer.
4. Output document using Document.write()
### 4. Usage Examples
#### 4.1 Package SBOM Generation**
```
$ alma-sbom package --rpm-package(-hash) (..other options..)
```
#### 4.2 Build SBOM Generation**
```
$ alma-sbom build --build-id (..other options)
```
#### 4.3 usage from help message
```
$ alma-sbom --help
usage: alma-sbom [-h] [--output-file OUTPUT_FILE] [--file-format {spdx-json,spdx-xml,spdx-yaml,spdx-tagvalue,cyclonedx-json,cyclonedx-xml}] [--albs-url ALBS_URL] [--immudb-username IMMUDB_USERNAME]
[--immudb-password IMMUDB_PASSWORD] [--immudb-database IMMUDB_DATABASE] [--immudb-address IMMUDB_ADDRESS] [--immudb-public-key-file IMMUDB_PUBLIC_KEY_FILE] [--verbose] [--debug]
{package,build} ...
alma-sbom
positional arguments:
{package,build}
package Generate package SBOM
build Generate build SBOM
optional arguments:
-h, --help show this help message and exit
--output-file OUTPUT_FILE
Full path to an output file with SBOM. Output will be to stdout if the parameter is absent or emtpy
--file-format {spdx-json,spdx-xml,spdx-yaml,spdx-tagvalue,cyclonedx-json,cyclonedx-xml}
Generate SBOM in one of format mode (default: spdx-json)
--albs-url ALBS_URL Override ALBS url
--immudb-username IMMUDB_USERNAME
Provide your immudb username if not set as an environmental variable
--immudb-password IMMUDB_PASSWORD
Provide your immudb password if not set as an environmental variable
--immudb-database IMMUDB_DATABASE
Provide your immudb database if not set as an environmental variable
--immudb-address IMMUDB_ADDRESS
Provide your immudb address if not set as an environmental variable
--immudb-public-key-file IMMUDB_PUBLIC_KEY_FILE
Provide your immudb public key file if not set as an environmental variable
--verbose Print verbose output
--debug Print debug log
```
**package subcommand usage**
```
$ alma-sbom package --help
usage: alma-sbom package [-h] (--rpm-package-hash RPM_PACKAGE_HASH | --rpm-package RPM_PACKAGE)
optional arguments:
-h, --help show this help message and exit
--rpm-package-hash RPM_PACKAGE_HASH
SHA256 hash of an RPM package
--rpm-package RPM_PACKAGE
path to an RPM package
```
**build subcommand usage**
```
$ alma-sbom build --help
usage: alma-sbom build [-h] --build-id BUILD_ID
optional arguments:
-h, --help show this help message and exit
--build-id BUILD_ID SHA256 hash of an RPM package``
```
### 5. Future Extensibility
#### 5.1 Adding New Data Sources
**Steps to add new data source**
1. Add new collectors to data/collectors/
2. Update cli/factory to enable to generate new collector instance from cli
3. Update cli/commands to use new data source
For example, if we want to implement deployed-SBOMs in the future, when adding rpmdb as a new data source, we can implement it separately from other data source implementations.
#### 5.2 Adding New Features (SBOM Types)
**Steps to add new features(SBOM types)**
1. Add new model to models/
2. Add new data collector method to get data as new model
3. Add new implementation to formats/ to process and output the new model in each format
4. Add new model to the CLI as sub command
For example, when implementing ISO-SBOM in the future, it can be implemented separately from other functionalities.
Also, as described in "4.1 Usage Examples of Command Line Programs Provided by alma-sbom", when adding new types of SBOMs that can be generated as new functionality, they can be extended as subcommand.
**Note:** Even if we need to change how commands are provided, since implementations are separated by functionality, structural changes should not be as costly as before.
#### 5.3 Add New SBOM Formats
**Steps to add new SBOM format**
1. Add new format to formats/
2. Update cli/factory to enable to generate new format's Document class instance from cli
3. Update cli/commands to add option of new format
While there are no candidates at present, if we need to support formats other than SPDX and CycloneDX in the future, we can implement them separately from other formats.
**Note:** Since SPDX 3.0 has a significantly different structure from previous SPDX 2.x versions, when implementing support for it, it might be better to implement it separately from the existing SPDX implementation. This design allows for such clear separation in implementation.
### 6. Other Implementations (Logging, Error Handling)
#### 6.1 Logging
Logger instances will be obtained and used in all files through ``logging.getLogger(__name__)``.
When implementing the CLI, root logger configuration will be set on the command side if needed. (already done in alma_sbom.py)
#### 6.2 Error Handling
No custom exceptions are defined at this time.
Each layer and function raises the appropriate standard exception when an error occurs.
In the CLI implementation, wrap everything in a try block to catch all exceptions, and simply output the exception content.
Like below:
```
# cli/main.py
def main():
try:
parser = create_parser()
args = parser.parse_args()
args.func(args)
except Exception as e:
if 'debug' in args and args.debug:
raise e
_logger.debug(f"Error: {str(e)}", file=sys.stderr) ### need to be rewriten logger
raise e
```
## Old
Keep here what has been deleted for documentation reasons.
### Regarding the Utilization of Current alma-sbom Source Code
The current alma-sbom has the following structure.
```
alma-sbom/
├── alma_sbom.py
├── git_notarize.py
├── version.py
├── setup.py
├── README.md
├── LICENSE
├── .gitignore
└── libsbom/
├── __init__.py
├── common.py
├── constants.py
├── cyclonedx.py
└── spdx.py
```
This chapter describes which parts of each file can be reused and migrated into the post-refactoring design.
#### Python Scripts for CommandLine
**alma_sbom.py**
The implementations in this script can be divided into the following categories. Each part can be utilized in different sections of the new structure.
- Database Queries Using immudb_wrapper
-> Can be utilized within data/collectors/immudb
- Information retrieval from packages using the rpm module
-> Can be utilized within data/collectors/rpm
- Various utilities (_generate_cpe, generatepurl, etc.)
-> Can be moved to models/utils.py and utilized as common utilities for models
**git_notarize.py**
This script is a tool for notarizing AlmaLinux git sources and is not related to any SBOM generation functionality.
Therefore, this script will be included in the repository as-is as a standalone script.
(Since alma-sbom has become bloated as an SBOM generation tool, it might be more appropriate to separate this script into a different project.)
#### Under the libsbom Directory
**spdx.py**
Many SBOM generation-related components can be utilized within formats/spdx
**cyclonedx.py**
Many SBOM generation-related components can be utilized within formats/cyclonedx
**common.py**
Many utilities can be reused. Their destination in the new structure needs to be considered.
They should be migrated to models/utils.py and handled on the model side, or should we place something like utils.py under the formats directory to handle processing for each format.
**constants.py**
We want to reuse the defined values as much as possible. They can be utilized under the constants directory.
#### Other Files
**README.md, LICENSE**
Will be utilized as-is for the most part.
Only README.md might need some partial changes to align with the current modifications.
**setup.py**
Due to the changed directory structure, modifications will be made from a Python package perspective.
Some modifications are needed as the CLI delivery method will also change.
Others will basically be utilized as they are.
**version.py**
Will be utilized as-is, but the version number will be incremented.
The file name might be renamed to _version.py.
## Changelog:
- 2024/11/12: Init document published - 2024/11/20: Remove SBOM layer
- 2024/11/28: Edit '2.1. Overall Structure'
- Remove data/{service.py, aggregator, convertor}
- Remove models layer and move data models into data layer.
- Add Data merge function(what I want to do in data/aggregator before) to data/models/merge.py
- Remove formats/*/factory.py, move to it's models.py
- Remove formats/*/generator, move to it's models.py
- 2024/11/29:
- Make cli/commands to define some subcommands
- 2024/12/03:
- Make cli/config to implement options
- Add config layer (Move cli/config to config(top layer))
- 2025/1/29:
- Update '2.1 Overall Structure' to reflect current implementation
- Move config layer to cli/config
- Add cli/factory
- Add/Remove/Rename files
- add type.py
- data layer:
- add models/utils.py
- add attributes/property.py
- add collectors/albs.py
- formats layer:
- add document.py
- rename spdx/models.py to spdx/document.py
- rename cyclonedx/models.py to cyclonedx/document.py
- cli layer:
- add logging.py
- cli/config:
- delete config/env.py
- rename config/models to config/commands
- Delete '2.2.3 Config Layer' and Organize chapter number
- Add config/ descriptions in new '2.2.3 CLI Layer'
- Update following chapter with the changes in '2.1 Overall Structure'
- '2.2.1 Data Layer'
- '2.2.2 Formats Layer' description
- '2.2.3 CLI Layer' contents
- Update '3. Data Flow', remove sub chapter and make it simple description
- Update '4. Usage Examples':
- Remove '4.2 Examples of Programmatic Usage(like API)'
- Change chapter structure to show only command-line usage simply
- Remove '5. Regarding the Utilization of Current alma-sbom Source Code' as it is no longer needed due to progress in implementation. Move old field in end of this document.
- Update description of '5. Future Extensibility'
- Update '6.2 Error Handling' not to use custome exception