Try   HackMD

Is Arazzo Turing Complete?

An investigation into the possible Turing Completeness of the Arazzo Specification.

Contact, and More details at https://captnemo.in/

References:

Basics

An Arazzo Specification builds a workflow(s) on top of operations that are defined in an OpenAPI Specification. It has some control flow, and variable storage, but is just a GitHub Actions Workflow Syntax like DSL that is written primarily in YAML.

Since Arazzo is not a real programming language, this is a fun investigation to see how much computation can you in such an environment? Relying on any specific HTTP API Behaviour (say using the repl.it API) would count as cheating, but relying on standard HTTP behaviour seems like a fair game.

Strategies

Interesting starting points:

Criterion Objects

Criterion are expressions used to decide whether a given Step (which converts to an API call via an OpenAPI Operation) was successful or not. You can write these conditions in many ways, all of which are great for control flow:

  • JSONPath
  • Regex
  • Simple expressions, which support some basic operations:
    <,<=,>,>,==,!=,!,&&,||
    () Logical Grouping
    [] (Index (0-based)
    a.b Property de-reference) on top of basic literals (number, null, string, true, false)
  • XPath

JSONPath in particular comes with a small list of functions which can be helpful (length, count, match, search, value)

Runtime Expressions

A runtime expression allows values to be defined based on information that will be available within an HTTP message, an event message, and within objects serialized from the Arazzo document such as workflows or steps.

These are simple strings like $request.header.accept, $workflows.foo.inputs.username, or $response.body#/status (#/status is a JSON Pointer) that can be used to "pluck" a specific value.

Runtime Expressions are used in multiple places:

  1. workflow.dependsOn could point to other dynamically generated workflow reference. It is unclear at what point this is evaluated in the spec, since a workflow could be called multiple times, and the context could change over time.
  2. step.operation[Id|Path]: A step can refer to an OpenAPI Operation dynamically.
  3. step.workflowId: A step doesn't have to invoke an operation, it can just point to a workflow.
  4. step.outputs: Step outputs are generated using runtime expressions. More specifically, an output is a map of string:runtime-expression. Key names can't be dynamic, and there's no way to generate arrays inside outputs, without such array being plucked directly from a "source" (request/response body etc)

There's also a few other places, but I don't think these are essential components, since the inputs/outputs system can do whatever the other parts can:

  1. Parameter values can be generated using runtime-expressions
  2. Reuseable Objects, that can be referenced using runtime expressions

goto

We can write loops easily using the goto construct in Arazzo:

name: JoinWaitingList
type: goto
stepId: joinWaitingListStep
criteria:
    # assertions to determine if this success action should be executed
    - context: $response.body
      condition: $[?count(@.pets) > 0]
      type: jsonpath

The above is an example of a [Success|Failure]Action object, which can either mark an end or goto Step|Workflow.

There is also "Retry", which you can use to build loops as well, along with retryLimit. You can force a failing API Operation by marking the successCriteria as false, and then the step will automatically be retried to the retryLimit.

HTTP

There's also a bunch of HTTP+OpenAPI behaviour that we can rely upon. Examples:

  1. HTTP Trace, which returns the request headers in the response body.
  2. Common replies for Invalid HTTP Semantics, such as expecting a 405 error code by sending invalid HTTP methods.

Loops

Attempt 1

steps:
  - stepId: repeatStep
    operationId: updateCounterOperation
    description: Repeat this step until the counter reaches N
    outputs:
      counter: $steps.repeatStep.outputs.counter + 1

successActions:
  - name: continueLoop
    type: goto
    stepId: repeatStep
    criteria:
      condition: $steps.repeatStep.outputs.counter < $inputs.N

counter: $steps.repeatStep.outputs.counter + 1 is unfortunately invalid, since the output can only be a runtime-expression, and while $steps.repeatStep.outputs.counter is a valid RE, $steps.repeatStep.outputs.counter+1 is not.

Attempt 2

steps:
  - stepId: repeatStep
    operationId: updateCounterOperation
    description: Repeat this step until the counter reaches N
    outputs:
      counter: $steps.repeatStep.outputs.counter
    criteria:
      condition: false # This step always fails
    onFailure:
      type: retry
      stepId: repeatStep
      retryLimit: 5

This repeats repeatStep 5 times. retryLimit is an integer, so it must be provided in the doc.

Storage

Loops seem doable, but its storage that is much tricker.

We have assignments, but they are not arbitary - We can use step outputs to assign a dictionary mapping of strings to runtime expressions. These expressions can not involve arbitary expressions. They do support JSON-Pointer, but not JSONPath (so we don't get the JSONPath methods). The list of sources is quite decent:

expression = ( "$url" / "$method" / "$statusCode" / 
    "$request." source / "$response." source / 
    "$inputs." name / "$outputs." name / "$steps." name /
    "$workflows." name / "$sourceDescriptions." name)
parameter-name = name ; Reuses 'name' rule for parameter names
source = ( header-reference / query-reference / path-reference / body-reference )
header-reference = "header." token
query-reference = "query." name
path-reference = "path." name
body-reference = "body" ["#" json-pointer ]
json-pointer    = *( "/" reference-token )

So we can pluck out specific elements via JSON Pointers from the request or response body. We can also get a specific output or input value. An output value can refer to itself as well.

But there is no direct way for an output to be set to an operation result.

Final Thoughts

Still WIP, but the easy way out is to use criterion to set (and as a corollary) not set specific outputs.

The perfect solution would be an arbitarily large implementation of Rule 110 that is only bound by memory. Some imperfect solutions seem easily within reach:

  1. A small 5 block Rule 110 implementation seems doable by just hardcoding each pair-wise operation.
  2. There's some possibilities of chaining up workflows as well, by defining a separate workflow for each "computation". So you get N workflows to build a N-block implementation.
  3. If none of these work out, generating a really large workflow is still polynomial time, and considered okay. So given N, we can statically generate a really large Arazzo Workflow that unrolls everything on compile time. This is technically not cheating, but not in the spirit of what we want to achieve.

Given enough time, I would also like to try similar investigations in the CI/CD Space against:

  1. GitHub Actions/Tekton/Jenkins Workflow Syntax
  2. Ansible YAML Templating
  3. Helm Templating

Aside

I mentioned in my talk that "trying to use something for unintended purposes" is a great way of learning it. That's very much the case here - I've found half a dozen corrections or clarifications in the spec, as a lot of what I'm trying to do searches for unintended behaviour.