# Smart Data Specification
<br>
# Nautirust
---
## Context
Provenance of data.

---
## p-plan

```
ex:Square a ex:Shape;
a p-plan:Entity.
ex:Star1 a p-plan:Activity;
rdfs:comment "Transforms shape";
p-plan:used ex:Square.
ex:Circle a ex:Shape;
a p-plan:Entity.
p-plan:wasGeneratedBy ex:Star1.
```
---

---

```
# sds:Stream is subclass of p-plan:Entity
ex:SquareStream a sds:Stream;
a p-plan:Entity.
ex:Star a p-plan:Activity;
rdfs:comment "Transforms shape";
p-plan:used ex:Square.
ex:CircleStream a sds:Stream;
a p-plan:Entity.
p-plan:wasGeneratedBy ex:Star1.
```
---
## SDS
Metadata ontology for data on a stream
<br>
Including:
- provenance
- kind of data on stream
---
### Structure
What data is on this stream?
---
sds:carries property
```ttl
ex:CsvStream a sds:Stream;
sds:carries sds:Record.
ex:CsvToMember a p-plan:Activity;
p-plan:used ex:CsvStream.
ex:MemberStream a sds:Stream;
p-plan:wasGeneratedBy ex:CsvToMember;
sds:carries sds:Member;
sds:shape <person.sh>.
```
---
Data on a stream has a link to the originating stream.
csv row
```turtle
[] a sds:Record;
sds:payload "42,43"^^csvw:Row;
sds:stream ex:CsvStream.
```
TREE Member
```rdf
[] a sds:Member;
sds:payload ex:member1;
sds:stream ex:MemberStream.
ex:member1 a ex:Person;
foaf:name "Arthur".
```
---
## SDS also includes
- `sds:dataset` talks about the licence etc for data on the stream.
- `sds:bucket` the stream can be split up in buckets or partitions
---
# Nautirust
### An orchestrator for workflows
---
#### Data processing

---
#### Complex setups of processing units
`Source -> [ MapperA, MapperB ] -> Aggregator ...`
---
### Reality of setups
- Bunch of bash scripts (yuck!)
- Different commands for different runs
- Coordination between components
---
### Case study (RMLStreamer benchmark)

---
### Case study (RMLStreamer benchmark)
- Nested bash scripts
- Difficult to switch out components
- Engine dependent CLI args
- Local and global level mixed
- Local level: application
- Global level: workflow pipeline
---
## Nautirust to the rescue!
---
## What is Nautirust?
- An orchestrator for dataprocessing workflows
---
## Why Nautirust?
- Language independent
- Data provenance
- Reproducible workflow
- Separation of focus
---
### Language independent
- We support **ANY** languages you love!
- Mix-match languages
---
### Data provenance

Create a stream of processes
that transform data and metadata.
---
### Reproducible workflow
Nautirust pipeline configuration starts
the same pipeline every time.
---
### Separation of focus
- Local level
- Application execution
- Global level
- Workflow pipeline
- Source -> Mapper -> Aggregator -> ...
---
## Components of Nautirust
* Runner
* Step
* Channel
---
### Runner
Executor of your step based on
config provided by Nautirust
* Config file
* Injector required (ex. RML config injector)
* Available channels
---
### Step
- Your application step (mapping, ldes server)
- Configure a step with parameters.
- Nautirust requests these parameters from the user.
---
### Channels
Some way to transfer data from step A to step B.
For example: kafka, tcp, files ...
---
## Configuration Example
[Nautirust](https://github.com/ajuvercr/nautirust)
[Nautirust-config](https://github.com/ajuvercr/nautirust-configs)
{"metaMigratedAt":"2023-06-17T06:31:39.664Z","metaMigratedFrom":"YAML","title":"Smart Data Specification","breaks":true,"slideOptions":"{\"transition\":\"slide\",\"theme\":\"white\"}","contributors":"[{\"id\":\"db3ed31e-892c-4962-a482-6fffa8ff3bd4\",\"add\":1812,\"del\":458},{\"id\":\"9ed5561b-cb3d-4802-bcf7-715f1a664ccb\",\"add\":3620,\"del\":935}]"}