# processing pipeline:
a config approach.
<!-- Put the link to this slide here so people can follow -->
**these** slides: https://hackmd.io/@rig/rkiSkume9
prev. concept: https://hackmd.io/@rig/Hy37BuqdY#/
---
# topics
* what's new - into the code
* status / TODO
* current questions
* contributions
---
# News
* **class methods** now _callable_ pipeline strategies
* **registration** mostly automated
---
# pipeline configuration
what can a configuration `my_pipeline_config.yaml` look like?
----
## config example w/ class mehods as strategy
```yaml
# file: my_pipeline_config.yaml
environment:
input_path: "root/of/project/data"
cache_path: "defined/by/default/below/input_path/"
output_path: "defined/by/default/below/input_path/"
# other things that might become necessary
define: stuff
that: "i/need.csv"
example: /a/folder/somwhere/
pipeline:
- "my_job1": # (specified as process of steps)
- step1: # is standard strategy (function) call
arg1: 'thisfile.stuff'
arg2: 'that_col'
- my_step2: # is my special step's strategy (function)
#input: ~ # default: previous step's output
arg2: 5
arg3: 'yes'
arg_n: foo
- step_s:
#input: ~ # default: previous step's output
# last step's output stored to "":context:my_job1"
- "my_job2":
- Trips.read: # class method as strategy function
file: some/data_file.csv
# usually returning a Trips instance
- Trips.filter_xyz:
on_what: "context:my_job1" # from context store
```
---
# register strategy
mostly _automated_, requires the following things:
1. define the function or method (code it)
2. function declaration works just as before
3. methods declared as strategies by decorator
4. add function or method to the module special variable `__strategy_methods__`, where all _strategy names_ (`type str`) are defined
5. include that module in the pipeline execution script as module that provides strategies
----
## define (procedural) strategy function:
```python=
# "my_module.py"
# near head of file:
__strategy_methods__ = [
'my_step2',
'my_other_strategy_function',
]
# normal new own function to be used as pipeline strategy
# (right now) keywords only function
# arg1 set to default None, to allow input of previous step result
def my_step2(*, input=None, arg2='foo_default', arg3=None):
# my magic
return my_result
```
----
## registration
providing the pipeline context with my method:
```python=
# "my_pipeline.py"
import matsim_salapym
from matsim_salapym import pipeline_strategies
from matsim_salapym.formats import *
import my_module
# near head of file:
all_strategy_modules = [
pipeline_strategies,
trips,
my_module,
]
# %% example application core
def execute(filename: str):
with PipelineContext(filename) as pc:
# TODO: implement as auto-import on instanciation of PC
# for all available strategy methods
for strat_module in all_strategy_modules:
lprint("+ registering strategies from module: ", strat_module.__name__)
# register all mentioned strategy methods
pc.register_all_module_strategies(strat_module)
```
---
# DEMO time
how that looks in the code
---
# status / TODO
---
## current questions
**where to usually implement** analysis / aggregation / strategies ?
(corresponding to the registration methods)
* as methods of class ?
e.g. `Trips.produce_some_stats()`
* as class in the module ?
e.g. class `trips.StatsProducer`
* as strategy functions in the module
e.g. `trips.produce_some_stats()`
* completely outside, in a new module
---
## current questions
how to **keep results** ?
* by scenario - most likely
allows running a pipeline by scenario - even in parallel
* keep results as pickles of classes
(e.g. StatsProducer)
OR in some dict (more general approach)
---
## TODO: multiple parameter sets
running cartesian parameter products of
* many scenarios
* multiple filters
* multiple parametrized aggregators
got an idea:
* specify as extra block in config
* that could be additionally loaded to a normal pipeline config
---
## TODO: filter / selector implementation
should be straightforward
* by pandas Series, DataFrames
* of indices or data itself
---
# contributions call
* get set up with some data to work on
* porting analysis functions
from old Levitate pipeline
---
# The END
{"metaMigratedAt":"2023-06-16T20:08:19.914Z","metaMigratedFrom":"YAML","title":"Pipeline update 2022-02-23","breaks":true,"description":"View with \"Slide Mode\".","slideOptions":"{\"transition\":\"slide\"}","contributors":"[{\"id\":\"27dbcc5c-a89f-4466-8618-5052f83fe9b6\",\"add\":3673,\"del\":3335}]"}