Data Architecture

tags: digital-twin-working-group

Types

  • Class based object which is of the subtype of:
  1. Extract: Taking data into the system
  2. Transform: Transforming data supplied from an extract function
  3. Aggregation: Aggregating multiple pipes
  4. Load: Endpoint to load to an external source

Mechanisms

Within a class, we can think of the following types of mechanisms:

  1. A linking mechanism to say that this pipe runs after a certain other one(s) have been executed.
  2. Integrations into the current pipeline softwares like airflow
  3. Integrations with tests so that the tests could each be defined within the data class
  4. Integrations with documentation tools to potentially aid automated documentation techniques or make it easier to write strong documentation

Orchestrator

Is there a need for an orchestrator function which links all the pipelines?

Inputs/Outputs

Considerations:

  1. Should there be a generalized payload (any) type passed between pipes, a single data type (defined for each pipeline) or a payload with defined data types?
  2. Extract would have no input pipeline, load would have no output pipeline, the other two would have both.
  3. Where might credentials go into the extract function?
  4. With strong typing + linting (react style) there might be a more bullet proofed process
  5. Strong typing can automate pieces of the documentation.
Select a repo