Status: discussion
Contributors:
CDEvents are generated by tools that are used in CD pipelines and, more generically, as part of the software lifecycle, from the concept, through production and consumption.
The software lifecycle includes various workflows that may overlap, intersect and exist at different levels of abstraction.
Some high level examples could be:
A single event may belong to several workflows.
CDEvents should provides means to store workflow relevant data, and could recommend effective ways to identify all events in a workflow.
CDEvents today include subjects and predicates; subjects can be globally identified via a combination of id and source. This allows CDEvents users to extract specific workflows such as the entire lifecycle of a specific subjects, through the various predicates executed over time.
For instance, an environment may be created and deleted. An artifact may be packaged and published. Timestamps and semantics attached to the predicated allow building subject specific workflows.
Splitting events associated to the same subject into different workflows, and including events associated to multiple subjects in a single worklow are both difficult to achieve in CDEvents v0.1, as both operations may require assumptions and guess-work based on the data available in CDEvents today.
We sometimes use the words "tracing" and "traceability" as synonyms, but I think that causes some confusion. "Tracing" is what is provided by e.g. OpenTelemetry, and "traceability" is the possibility to track a certain activity/-ies up- or down-stream to another activity/-ies. We could take some inspiration from this Wikipedia article about Tracing and specifically the table on "Event logging versus tracing".
The W3C trace-context reccomendation defines a way to propagate over HTTP a global trace ID, along with vendor specific context across severall tools. The trace context format can be transported over HTTP as well as CloudEvents.
CDEvents could use the trace context format to indicate the workflow they belong to. If no incoming trace context exists, a new trace context is generated.
The trace context approach works well for the first three requirements, but it does not align well with the requirement #4 of multiple workflows. About requirement #5, trace context support fan-out not fan-in.
What does trace context provide?
The workflow data approach consists in extending the CDEvents context by adding one or more workflow IDs that reprent the workflow(s) that the event belongs to.
An event producer MUST honour incoming workflow IDs when sending events as a reaction to incoming ones. We would need to better define the semantic of these "reactions" and whether we shall encode them in the context as well.
The workflow ID MUST be independent from the event ID and from the subject ID. A workflow ID can be associated with a subset of the events for a specific subject ID, and can be used in events that span multiple subject IDs.
Emil: we need semantics associated to the input/output associations
Ben: what will CDEvents look like for specific use cases like fan in, fan out etc
Similar to trace ID, but allows multiple workflow and moved into the CDEvents context
Split pipeline execution (fan out), then fan-in waits for all the parallel activies to complete before proceeding
Q: How can workflow ids be indexed efficiently in a database?
A event link is a reference from one event to a previous (target) event. It is created by adding the target event's context.id (uuid) into a dedicated section in the current event.
The below example show how an artifact.published
event references an artifact.packaged
event by adding a reference to the context.id
of the artifact.packaged
event to its links
object.
{
"context": {
"id" : "A234-1234-1234",
"type" : "dev.cdevents.artifact.packaged",
...
},
"subject" : {
"id": "pkg:golang/mygit.com/myorg/myapp@234fd47e07d1004f0aed9c",
"content": { ... }
}
}
{
"context": {
"id" : "AD34-7234-1238",
"type" : "dev.cdevents.artifact.published",
...
},
"subject" : {
"content": { ... }
},
"links": [
{
"type": CAUSE, // What caused the activity behind this event to happen?
"target": {
"context": {
"id": "A234-2345-4567" // E.g. a testsuiterun.finished event id
}
}
},
{
"type": ARTIFACT, // What artifact was published?
"target": {
"context": {
"id": "A234-1234-1234"
}
}
}
]
}
{
"context": {
"id" : "AD34-7234-1238",
"type" : "dev.cdevents.artifact.published",
...
},
"subject" : {
"id": "pkg:golang/myarm.io/myproj/myapp@mytag", // Optionally used to
// declare a new uri onto which the artifact has been uploaded
"content": { ... }
},
"links": [
{
"type": CAUSE, // What caused the activity behind this event to happen?
"target": {
"context": {
"id": "A234-2345-4567" // E.g. a testsuiterun.finished event id
}
}
},
{
"type": ARTIFACT, // What artifact was published?
"target": {
"context": {
"id": "A234-1234-1234",
"domain": <some unique domain identifier to make the target unique> (optional, not needed if events are not expected to be federated out of the domain),
},
"subject": {
// Optionally added fields from the target's subject object,
// for convenience to not need to look up all data from an event
// store
"id": "pkg:golang/mygit.com/myorg/myapp@234fd47e07d1004f0aed9c"
}
}
}
]
}
I believe so, in order to distinguish between multiple links in a links array.
The links object is an array of trace links to other Eiffel events, which always reference backwards in time. Each trace link is an object consisting of a type, a UUID corresponding to the meta.id of the target event, and optionally the id of the domain where the target event was published.
Software traceability is the practice of tracking changes, documents, and other engineering artifacts throughout the development process, and Eiffel events are used to represent these artifacts and their relationships to each other. By analyzing the resulting graphs, any number of questions related to the development process can be answered.
Based on graph theory
Can be used in combination with a graph database to store and efficiently process events
Used with good results in the Eiffel event protocol
Uniform way to relate all types of events
Linking events would let us "tell us a story" through the data
A hybrid approach would utilize the benefits of Eiffel while still providing fast lookups for downstream consumers by providing a global ID associated with all necessary linked events. The idea would be the global ID would pull everything associated with that ID, but Eiffel would make sense of how things are represented in that graph which could be organized in the frontend.
Below represents the payload of a single event, pipelineRun
, associated with some global ID:
{
"id": "namespace/pipelinerun-1234",
"source": "",
"pipeline_name": "my-pipeline",
"outcome": "success",
"url": "http://mypipeline.com/my-pipeline",
"context": {
// having a global ID, would let us make a single query to return
// all events associated with this global ID.
"global_id": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
// timestamp can be used to do some basic sorting of when events
// occured, which can alleviate some stress from the frontend when
// traversing Eiffel links to construct the whole graph
"timestamp": 1677477232395,
"links": [ // Eiffel links
]
}
}