or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Machine Learning Model Extension Specification
Contributors
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Centre de Recherche Informatique de Montréal
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Canada Centre for Mapping and Earth Observation (CCMEO)
Description
The STAC Machine Learning Model (MLM) Extension provides a standard set of fields to describe machine learning models trained on overhead imagery and enable running model inference.
The main objectives of the extension are:
Specifically, this extension records the following information to make ML models searchable and reusable:
The MLM specification is biased towards providing metadata fields for supervised machine learning models. However, fields that relate to supervised ML are optional and users can use the fields they need for different tasks.
See Best Practices for guidance on what other STAC extensions you should use in conjunction with this extension. The Machine Learning Model Extension purposely omits and delegates some definitions to other STAC extensions to favor reusability and avoid metadata duplication whenever possible. A properly defined MLM STAC Item/Collection should almost never have the Machine Learning Model Extension exclusively in
stac_extensions
.For details about the earlier (legacy) version of the MLM Extension, formerly known as the Deep Learning Model Extension (DLM), please refer to the DLM LEGACY document. DLM was renamed to the current MLM Extension and refactored to form a cohesive definition across all machine learning approaches, regardless of whether the approach constitutes a deep neural network or other statistical approach. It also combines multiple definitions from the predecessor ML-Model extension to synthesize common use cases into a single reference for Machine Learning Models.
For more details about the
stac-model
Python package, which provides definitions of the MLM extension using bothPydantic
andPySTAC
connectors, please refer to the STAC Model document.Item Properties and Collection Fields
The fields in the table below can be used in these parts of STAC documents:
framework
library version. Some models require a specific version of the machine learningframework
to run.pretrained_source
if it is known.pretrained = false
), thenull
value should be set explicitly.null
explicitly, the model does not require any specific accelerator.accelerator
is the onlyaccelerator
that can run inference. If undefined, it should be assumedfalse
.accelerator
, such as its specific generation, or other relevant inference details.accelerator
instances required to run the model.To decide whether above fields should be applied under Item
properties
or under respective Assets, the context of each field must be considered. For example, themlm:name
should always be provided in the Itemproperties
, since it relates to the model as a whole. In contrast, some models could support multiplemlm:accelerator
, which could be handled by distinct source code represented by different Assets. In such case,mlm:accelerator
definitions should be nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset level would be considered for that specific Asset, and the value at the Item level would be used for other Assets that did not override it for their respective reference. For some of the fields, further details are provided in following sections to provide more precisions regarding some potentially ambiguous use cases.In addition, fields from the multiple relevant extensions should be defined as applicable. See Best Practices - Recommended Extensions to Compose with the ML Model Extension for more details.
For the Extent Object in STAC Collections and the corresponding spatial and temporal fields in Items, please refer to section Best Practices - Using STAC Common Metadata Fields for the ML Model Extension.
Model Architecture
In most cases, this should correspond to common architecture names defined in the literature, such as
ResNet
,VGG
,GAN
orVision Transformer
. For more examples of proper names (including casing), the Papers With Code - Computer Vision Methods can be used. Note that this field is not an explicit "Enum", and is used only as an indicator of common architecture occurrences. If no specific or predefined architecture can be associated with the described model, simply employunknown
or another custom name as deemed appropriate.Task Enum
It is recommended to define
mlm:tasks
of the entire model at the STAC Item level, andtasks
of respective Model Output Object with the following values. Although other values are permitted to support more use cases, they should be used sparingly to allow better interoperability of models and their representation.As a general rule of thumb, if a task is not represented below, an appropriate name can be formulated by taking definitions listed in Papers With Code. The names should be normalized to lowercase and use hyphens instead of spaces.
label:tasks
regression
regression
classification
classification
scene-classification
detection
detection
object-detection
segmentation
segmentation
semantic-segmentation
instance-segmentation
panoptic-segmentation
similarity-search
generative
image-captioning
super-resolution
If the task falls within the category of supervised machine learning and uses labels during training, this should align with the
label:tasks
values defined in STAC Label Extension for relevant STAC Collections and Items published with the model described by this extension.It is to be noted that multiple "generic" tasks names (
classification
,detection
, etc.) are defined to allow correspondance withlabel:tasks
, but these can lead to some ambiguity depending on context. For example, a model that supportsclassification
could mean that the model can predict patch-based classes over an entire scene (i.e.:scene-classification
for a single prediction over an entire area of interest as a whole), or that it can predict pixel-wise "classifications", such as land-cover labels for every single pixel coordinate over the area of interest. Maybe counter-intuitively to some users, such a model that produces pixel-wise "classifications" should be attributed thesegmentation
task (and more specificallysemantic-segmentation
) rather thanclassification
. To avoid this kind of ambiguity, it is strongly recommended thattasks
always aim to provide the most specific definitions possible to explicitly describe what the model accomplishes.Framework
This should correspond to the common library name of a well-established ML framework. No "Enum" are enforced to allow easy addition of newer frameworks, but it is STRONGLY recommended to use common names when applicable. Below are a few notable entries.
PyTorch
TensorFlow
scikit-learn
Hugging Face
Keras
ONNX
rgee
spatialRF
JAX
MXNet
Caffe
PyMC
Weka
Accelerator Type Enum
It is recommended to define
accelerator
with one of the following values:amd64
models compatible with AMD or Intel CPUs (no hardware specific optimizations)cuda
models compatible with NVIDIA GPUsxla
models compiled with XLA. Models trained on TPUs are typically compiled with XLA.amd-rocm
models trained on AMD GPUsintel-ipex-cpu
for models optimized with IPEX for Intel CPUsintel-ipex-gpu
for models optimized with IPEX for Intel GPUsmacos-arm
for models trained on Apple SiliconWarning
If
mlm:accelerator = amd64
, this explicitly indicates that the model does not (and will not try to) use any accelerator, even if some are available from the runtime environment. This is to be distinguished from the valuemlm:accelerator = null
, which means that the model could make use of some accelerators if provided, but is not constrained by any specific one. To improve comprehension by users, it is recommended that any model usingmlm:accelerator = amd64
also set explicitlymlm:accelerator_constrained = true
to illustrate that the model WILL NOT use accelerators, although the hardware resolution should be identical nonetheless.When
mlm:accelerator = null
is employed, the value ofmlm:accelerator_constrained
can be ignored, since even if set totrue
, there would be noaccelerator
to contain against. To avoid confusion, it is suggested to set themlm:accelerator_constrained = false
or omit the field entirely in this case.Model Input Object
"RGB Time Series"
) can be used instead.statistics
of same dimensionality and order as thebands
field in this object.null
when none applies. Consider usingpre_processing_function
for custom implementations or more complex combinations.norm_type = "clip"
, this array supplies the value for eachbands
item, which is used to divide each band before clipping values between 0 and 1.null
when none applies. Consider usingpre_processing_function
for custom implementations or more complex combinations.pre_processing_function
should be applied over all availablebands
. For respective band operations, see Model Band Object.Fields that accept the
null
value can be considerednull
when omitted entirely for parsing purposes. However, settingnull
explicitly when this information is known by the model provider can help users understand what is the expected behavior of the model. It is therefore recommended to providenull
explicitly when applicable.Bands and Statistics
Depending on the supported
stac_version
and otherstac_extensions
employed by the STAC Item using MLM, the STAC 1.1 - Band Object, the STAC Raster - Band Object or the STAC EO - Band Object can be used for representing bands information, including notably thenodata
value, thedata_type
(see also Data Type Enum), and Common Band Names.Warning
Only versions
v1.x
ofeo
andraster
are supported to providemlm:input
band references. Versions2.x
of those extensions rely on the STAC 1.1 - Band Object instead. If those versions are desired, consider migrating your MLM definition to use STAC 1.1 - Band Object as well for referencingmlm:input
with band names.Note
Due to how the schema for
eo:bands
is defined, it is not sufficient to only provide theeo:bands
property at the STAC Item level. The schema validation of the EO extension explicitly looks for a corresponding set of bands under an Asset, and if none is found, it disallowseo:bands
in the Item properties. Therefore,eo:bands
should either be specified only under the Asset containing themlm:model
role (see Model Asset), or define them both under the Asset and Item properties. If the second approach is selected, it is recommended that theeo:bands
under the Asset contains only thename
or thecommon_name
property, such that all other details about the bands are defined at the Item level. An example of such representation is provided in examples/item_eo_bands_summarized.json.For an example where
eo:bands
are entirely defined in the Asset on their own, please refer to examples/item_eo_bands.json instead.For more details, refer to stac-extensions/eo#12.
Note
When using
raster:bands
, and additionalname
parameter MUST be provided for each band. This parameter is not defined inraster
extension itself, but is permitted. This addition is required to ensure thatmlm:input
bands referenced by name can be associated to their respectiveraster:bands
definitions.Only bands used as input to the model should be included in the MLM
bands
field. To avoid duplicating the information, MLM only uses thename
of whichever "Band Object" is defined in the STAC Item. An input'sbands
definition can either be a plainstring
or a Model Band Object. When astring
is employed directly, the value should be implicitly mapped to thename
property of the explicit object representation.One distinction from the STAC 1.1 - Band Object in MLM is that Statistics object (or the corresponding STAC Raster - Statistics for STAC 1.0) are not defined at the "Band Object" level, but at the Model Input level. This is because, in machine learning, it is common to need overall statistics for the dataset used to train the model to normalize all bands, rather than normalizing the values over a single product. Furthermore, statistics could be applied differently for distinct Model Input definitions, in order to adjust for intrinsic properties of the model.
Model Band Object
expression
property.format
specified. The expression can be applied to any data type and depends on theformat
given.Note
Although
format
andexpression
are not required in this context, they are mutually dependent on each other.See also Processing Expression for more details and examples.
The
format
andexpression
properties can serve multiple purpose.Applying a band-specific pre-processing step, in contrast to
pre_processing_function
applied over all bands. For example, reshaping a band to align its dimensions with other bands before stacking them.Defining a derived-band operation or a calculation that produces a virtual band from other band references. For example, computing an indice that applies an arithmetic combination of other bands.
For a concrete example, see examples/item_bands_expression.json.
Data Type Enum
When describing the
data_type
provided by a Band, whether for defining the Input Structure or the Result Structure, the Data Types from the STAC Raster extension should be used if using STAC 1.0 or earlier, and can use Data Types from STAC 1.1 Core for later versions. Both definitions should define equivalent values.Input Structure Object
shape
dimensions by name.A common use of
-1
for one dimension ofshape
is to indicate a variable batch-size. However, this value is not strictly reserved for theb
dimension. For example, if the model is capable of automatically adjusting its input layer to adapt to the provided input data, then the corresponding dimensions that can be adapted can employ-1
as well.Dimension Order
Recommended values should use common names as much as possible to allow better interpretation by users and scripts that could need to resolve the dimension ordering for reshaping requirements according to the ML framework employed.
Below are some notable common names recommended for use, but others can be employed as needed.
batch
channel
time
height
width
depth
token
class
score
confidence
For example, a tensor of multiple RBG images represented as \(B \times C \times H \times W\) should indicate
dim_order = ["batch", "channel", "height", "width"]
.Normalize Enum
Select one option from:
min-max
z-score
l1
l2
l2sqr
hamming
hamming2
type-mask
relative
inf
clip
See OpenCV - Normalization Flags for details about the relevant methods. Equivalent methods from other packages are applicable as well.
When a normalization technique is specified, it is expected that the corresponding Statistics parameters necessary to perform it would be provided for the corresponding input. For example, the
min-max
normalization would require that at least theminimum
andmaximum
statistic properties are provided, while thez-score
would requiremean
andstddev
.If none of the above values applies,
null
(literal, not string) can be used instead. If a custom normalization operation, or a combination of operations (with or without Resize), must be defined instead, consider using a Processing Expression reference.Resize Enum
Select one option from:
crop
pad
interpolation-nearest
interpolation-linear
interpolation-cubic
interpolation-area
interpolation-lanczos4
interpolation-max
wrap-fill-outliers
wrap-inverse-map
See OpenCV - Interpolation Flags for details about the relevant methods. Equivalent methods from other packages are applicable as well.
If none of the above values applies,
null
(literal, not string) can be used instead. If a custom rescaling operation, or a combination of operations (with or without Normalization), must be defined instead, consider using a Processing Expression reference.Processing Expression
Taking inspiration from Processing Extension - Expression Object, the processing expression defines at the very least a
format
and the applicableexpression
for it to perform pre/post-processing operations on MLM inputs/outputs.expression
property.format
specified. The expression can be any data type and depends on theformat
given, e.g. string or object.On top of the examples already provided by Processing Extension - Expression Object, the following formats are recommended as alternative scripts and function references.
python
my_package.my_module:my_processing_function
ormy_package.my_module:MyClass.my_method
docker
ghcr.io/NAMESPACE/IMAGE_NAME:latest
uri
{"href": "https://raw.githubusercontent.com/ORG/REPO/TAG/package/cli.py", "type": "text/x-python"}
Note
Above definitions are only indicative, and more can be added as desired with even more custom definitions. It is left as an implementation detail for users to resolve how these expressions should be handled at runtime.
Warning
See also discussion regarding additional processing expressions: stac-extensions/processing#31
Model Output Object
"CLASSIFICATION"
) can be used instead.mlm:tasks
defined under the Itemproperties
as applicable.While only
tasks
is a required field, all fields are recommended for tasks that produce a fixed shape tensor and have output classes. Outputs that have variable dimensions, can define theresult
with the appropriate dimension value-1
in theshape
field. When the model does not produce specific classes, such as forregression
,image-captioning
,super-resolution
and somegenerative
tasks, to name a few, theclassification:classes
can be omitted.Result Structure Object
shape
dimensions by name for the result array.Class Object
See the documentation for the Class Object.
Model Hyperparameters Object
The hyperparameters are an open JSON object definition that can be used to provide relevant configurations for the model. Those can combine training details, inference runtime parameters, or both. For example, training hyperparameters could indicate the number of epochs that were used, the optimizer employed, the number of estimators contained in an ensemble of models, or the random state value. For inference, parameters such as the model temperature, a confidence cut-off threshold, or a non-maximum suppression threshold to limit proposal could be specified. The specific parameter names, and how they should be employed by the model, are specific to each implementation.
Following is an example of what the hyperparameters definition could look like:
Assets Objects
It is recommended that the Assets defined in a STAC Item using MLM extension use the above field property names for nesting the Assets in order to improve their quick identification, although the specific names employed are left up to user preference. However, the MLM Asset definitions MUST include the appropriate MLM Asset Roles to ensure their discovery.
MLM Asset Roles
Asset
roles
should include relevant names that describe them. This does not only include the Recommended Asset Roles from the core specification, such asdata
ormetadata
, but also descriptors such asmlm:model
,mlm:weights
and so on, as applicable for the relevant MLM Assets being described. Please refer to the following sections forroles
requirements by specific MLM Assets.Note that
mlm:
prefixed roles are used for identification purpose of the Assets, but non-prefixed roles can be provided as well to offer generic descriptors. For example,["mlm:model", "model", "data"]
could be considered for the Model Asset.In order to provide more context, the following roles are also recommended were applicable:
runtime
runtime
weights
,checkpoint
weights
,checkpoint
model
code
Note
(*) These roles are offered as direct conversions from the previous extension that provided ML-Model Asset Roles to provide easier upgrade to the MLM extension.
Model Asset
mlm:model
. Can include["mlm:weights", "mlm:checkpoint"]
as applicable.Recommended Asset
roles
includemlm:weights
ormlm:checkpoint
for model weights that need to be loaded by a model definition andmlm:compiled
for models that can be loaded directly without an intermediate model definition. In each case, themlm:model
should be applied as well to indicate that this asset represents the model.It is also recommended to make use of the file extension for this Asset, as it can provide useful information to validate the contents of the model definition, by comparison with fields
file:checksum
andfile:size
for example.Model Artifact Media-Type
Very few ML framework, libraries or model artifacts provide explicit IANA registered media-type to represent the contents they handle. When those are not provided, custom media-types can be considered. However, "unofficial but well-established" parameters should be reused over custom media-types when possible.
For example, the unofficial
application/octet-stream; framework=pytorch
definition is appropriate to represent a PyTorch.pt
file, since its underlying format is a serialized pickle structure, and itsframework
parameter provides a clearer indication about the targeted ML framework and its contents. Since artifacts will typically be downloaded using a request stream into a runtime environment in order to employ the model, theapplication/octet-stream
media-type is relevant for representing this type of arbitrary binary data. Being an official media-type, it also has the benefit to increase chances that HTTP clients will handle download of the contents appropriately when performing requests. In contrast, custom media-types such asapplication/x-pytorch
have higher chances to be considered unacceptable (HTTP 406 Not Acceptable) by servers, which is why they should preferably be avoided.Users can consider adding more parameters to provide additional context, such as
profile=compiled
to provide an additional hint that the specific PyTorch Ahead-of-Time Compilation profile is used for the artifact described by the media-type. However, users need to remember that those parameters are not official. In order to validate the specific framework and artifact type employed by the model, the MLM propertiesmlm:framework
(see MLM Fields) andmlm:artifact_type
(see Model Asset) should be employed instead to perform this validation if needed.Artifact Type Enum
This value can be used to provide additional details about the specific model artifact being described. For example, PyTorch offers various strategies for providing model definitions, such as Pickle (
.pt
), TorchScript, or PyTorch Ahead-of-Time Compilation (.pt2
) approach. Since they all refer to the same ML framework, the Model Artifact Media-Type can be insufficient in this case to detect which strategy should be used with.Following are some proposed Artifact Type values for corresponding approaches, but other names are permitted as well. Note that the names are selected using the framework-specific definitions to help the users understand the source explicitly, although this is not strictly required either.
torch.save
.pt
).torch.jit.script
TorchScript
.torch.export
torch.export
(i.e.:.pt2
).torch.compile
torch.compile
.Source Code Asset
["model", "code", "metadata"]
If the referenced code does not directly offer a callable script to run the model, the
mlm:entrypoint
field should be added to the Asset Object in order to provide a pointer to the inference function to execute the model. For example,my_package.my_module:predict
would refer to thepredict
function located in themy_module
inside themy_package
library provided by the repository.It is strongly recommended to use a specific media-type such as
text/x-python
if the source code refers directly to a script of a known programming language. Using the HTML rendering of that source file, such as though GitHub for example, should be avoided. Using the "Raw Contents" endpoint for such cases is preferable. Thetext/html
media-type should be reserved for cases where the URI generally points at a Git repository. Note that the URI including the specific commit hash, release number or target branch should be preferred over other means of referring to checkout procedures, although this specification does not prohibit the use of additional properties to better describe the Asset.Since the source code of a model provides useful example on how to use it, it is also recommended to define relevant references to documentation using the
example
extension. See the Best Practices - Example Extension section for more details.Recommended asset
roles
includecode
andmetadata
, since the source code asset might also refer to more detailed metadata than this specification captures.Container Asset
application/vnd.oci.image.index.v1+json
.["runtime"]
and any other custom roles.If you're unsure how to containerize your model, we suggest starting from the latest official container image for your framework that works with your model and pinning the container version.
Examples:
Using a base image for a framework looks like:
You can also use other base images. Pytorch and Tensorflow offer docker images for serving models for inference.
Relation Types
The following types should be used as applicable
rel
types in the Link Object of STAC Items describing Band Assets that result from the inference of a model described by the MLM extension.It is recommended that the link using
derived_from
referring to another STAC definition using the MLM extension specifies themlm:name
value to make the derived reference more explicit.Note that a derived product from model inference described by STAC should also consider using additional indications that it came of a model, such as described by the Best Practices - Processing Extension.
Contributing
All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.
Running tests
The same checks that run as checks on PRs are part of the repository and can be run locally to verify that changes are valid. To run tests locally, you'll need
npm
, which is a standard part of any node.js installation.First, install everything with npm once. Navigate to the root of this repository and on your command line run:
Then to check Markdown formatting and test the examples against the JSON schema, you can run:
This will spit out the same texts that you see online, and you can then go and fix your markdown or examples.
If the tests reveal formatting problems with the examples, you can fix them with: