Pipeline update 2022-02-23

# processing pipeline: a config approach.  **these** slides: https://hackmd.io/@rig/rkiSkume9 prev. concept: https://hackmd.io/@rig/Hy37BuqdY#/ --- # topics * what's new - into the code * status / TODO * current questions * contributions --- # News * **class methods** now _callable_ pipeline strategies * **registration** mostly automated --- # pipeline configuration what can a configuration `my_pipeline_config.yaml` look like? ---- ## config example w/ class mehods as strategy ```yaml # file: my_pipeline_config.yaml environment: input_path: "root/of/project/data" cache_path: "defined/by/default/below/input_path/" output_path: "defined/by/default/below/input_path/" # other things that might become necessary define: stuff that: "i/need.csv" example: /a/folder/somwhere/ pipeline: - "my_job1": # (specified as process of steps) - step1: # is standard strategy (function) call arg1: 'thisfile.stuff' arg2: 'that_col' - my_step2: # is my special step's strategy (function) #input: ~ # default: previous step's output arg2: 5 arg3: 'yes' arg_n: foo - step_s: #input: ~ # default: previous step's output # last step's output stored to "":context:my_job1" - "my_job2": - Trips.read: # class method as strategy function file: some/data_file.csv # usually returning a Trips instance - Trips.filter_xyz: on_what: "context:my_job1" # from context store ``` --- # register strategy mostly _automated_, requires the following things: 1. define the function or method (code it) 2. function declaration works just as before 3. methods declared as strategies by decorator 4. add function or method to the module special variable `__strategy_methods__`, where all _strategy names_ (`type str`) are defined 5. include that module in the pipeline execution script as module that provides strategies ---- ## define (procedural) strategy function: ```python= # "my_module.py" # near head of file: __strategy_methods__ = [ 'my_step2', 'my_other_strategy_function', ] # normal new own function to be used as pipeline strategy # (right now) keywords only function # arg1 set to default None, to allow input of previous step result def my_step2(*, input=None, arg2='foo_default', arg3=None): # my magic return my_result ``` ---- ## registration providing the pipeline context with my method: ```python= # "my_pipeline.py" import matsim_salapym from matsim_salapym import pipeline_strategies from matsim_salapym.formats import * import my_module # near head of file: all_strategy_modules = [ pipeline_strategies, trips, my_module, ] # %% example application core def execute(filename: str): with PipelineContext(filename) as pc: # TODO: implement as auto-import on instanciation of PC # for all available strategy methods for strat_module in all_strategy_modules: lprint("+ registering strategies from module: ", strat_module.__name__) # register all mentioned strategy methods pc.register_all_module_strategies(strat_module) ``` --- # DEMO time how that looks in the code --- # status / TODO --- ## current questions **where to usually implement** analysis / aggregation / strategies ? (corresponding to the registration methods) * as methods of class ? e.g. `Trips.produce_some_stats()` * as class in the module ? e.g. class `trips.StatsProducer` * as strategy functions in the module e.g. `trips.produce_some_stats()` * completely outside, in a new module --- ## current questions how to **keep results** ? * by scenario - most likely allows running a pipeline by scenario - even in parallel * keep results as pickles of classes (e.g. StatsProducer) OR in some dict (more general approach) --- ## TODO: multiple parameter sets running cartesian parameter products of * many scenarios * multiple filters * multiple parametrized aggregators got an idea: * specify as extra block in config * that could be additionally loaded to a normal pipeline config --- ## TODO: filter / selector implementation should be straightforward * by pandas Series, DataFrames * of indices or data itself --- # contributions call * get set up with some data to work on * porting analysis functions from old Levitate pipeline --- # The END