Histopathology Process Simulation - HackMD

<style> .reveal { font-size: 32px; }  .small { font-size: 24px; } .red {color: red;} code {color: red;} strong { color: green; font-weight: 900 } span.slide-number-a { font-size: 32px; margin: 16px; } .two-column-layout { columns: 2; /* Set column number */ font-size: 28px; max-width: 100%; overflow: hidden; } </style> # Histopathology Process Simulation Yin-Chi Chan <[ycc39@cam.ac.uk](mailto:ycc39@cam.ac.uk)> Institute for Manufacturing ![](https://www.ifm.eng.cam.ac.uk/files/images/templates/logo-footer.png)     <img src="https://hackmd.io/_uploads/B1k9SX6RR.png" alt="drawing" width="200"/> University of Cambridge --- Link to this presentation https://hackmd.io/@elasticdt/Hkua1daaR <img src="https://hackmd.io/_uploads/SJ_K3RHkJx.png" alt="drawing" width="400"/> --- ## Background - Collaboration with Addenbrooke's Hospital Histopathology department - Part of Cambridge University Hospitals NHS Foundation Trust - Model the process of creating *stained glass slides* from biological specimens - For example, for diagnosis of cancer and other diseases - Key performance indexes (KPI): - **Main:** Turnaround time (*laboratory*, overall) - Staff/machine utilization - Specimens in progress --- ## Histopathology - Analysis of tissue specimens for study/diagnosis of disease - Specimens &longrightarrow; blocks &longrightarrow; slides - Slides are then stained and digitally scanned to be analyzed by histopathologist - Current focus on **routine** stains using the dyes hematoxylin and eosin (H&E) --- ## Research Flowchart <span class=small>Research process steps for defining digitalization opportunities in a healthcare setting</span> ![image](https://hackmd.io/_uploads/rJa8KOT60.png) <span class=small>Source: [Moretti *et al.*, LoDiSA 2023](https://doi.org/10.1049/icp.2023.1735)</span> Current presentation focuses on the steps in <span class=red>red</span> - **2\.** Process logic modeling - **4c\.** Process simulation - **5c\.** Process performance analysis --- ## Process modeling - **von Neuman**: "truth [...] is much too complicated to allow anything but approximations" - **George Box**: "All models are wrong, but some are useful" <hr/> - Identify high-level tasks, abstract away how they are performed - Example: current model iteration ignores staff specializations based on tissue types - Simple 3-point estimations for task durations: low, most likely (mode), high - Focus on core processes, minimize decision branches --- ## Identified process stages ```mermaid flowchart TB subgraph x3["<b>3. Processing</b>"] direction TB decalc["3a. Decalcification (optional)"] --> proc["3b. Processing machine"] end subgraph one[" "] direction LR start(START) --> reception["1. Specimen Reception"] --> cutup["2. Cut-up"] --> x3 --> embedding["4. Embedding"] end subgraph two[" "] direction LR micro["5. Microtomy"] --> stain["6. Staining"] --> label["7. Cover-slipping"] --> scan["8. Digital scanning"] end subgraph three[" "] direction LR collate["9. Collation"] --> qc["10. Block & quality check"] --> alloc["11. Case allocation"] --> report["12. Reporting"] --> stop(END) end one --> two --> three style start fill:green,color:white style stop fill:red,color:white ``` --- ## Simulation features - Entities are hierarchical - Specimen &longrightarrow; Block &longrightarrow; Slide - Steps in each stage can operate on specimens, blocks, slides, or **batches** of these - Flow control: - **Branching:** different paths based on entity attributes - **Batching:** forming groups of like entities - **Collation:** only groups entities with the same parent entity - i.e. Collect all slides of a specimen before continuing - **Timed gates:** Only start certain jobs when a timed event is triggered (e.g. at 4:30PM daily) - **Bootstrapping** of initial specimen states --- ## <img src="https://hackmd.io/_uploads/SkElv3R6R.png" alt="drawing" width="150"/> model - Arena is a graphical discrete event simulation (DES) tool - DES - Single simulation thread with a **clock** - Ordered list of pending **events**, which are generated by **processes** - Simulation clock jumps directly from event to event, processing each event in turn - Events may spawn new events, e.g. processing a `load machine` event generates a `unload machine` event - Arena arranges processes into a **flowchart-like** visualization with blocks such as Create, Seize, Delay, Release, Batch, Split,... --- ## Arrival processes <div class="two-column-layout"> - Two arrival processes: - Cancer pathway - Non-cancer pathway ![image](https://hackmd.io/_uploads/SyE1KnCpC.png) </div> - Use a **time-varying** Poisson process with rates defined per hour - **Rejection** sampling based on the highest hourly rate as the base process - Rates (like most other parameters) loaded from an Excel file --- ## Staffing - Arena has the concept of **schedules** - Designating the total number units of a resource over time - We use this for setting staff levels - Schedules in our model are cyclic (one week) at half-hourly resolution - In contrast, machine resources are assumed to have fixed levels - **End-of-shift policy**: when number of staff is reduced, ensure non-replaced staff only leave after completing their current task --- ## Task duration distributions <div class="two-column-layout"> ![image](https://hackmd.io/_uploads/rJSEjhT6R.png) - We typically used **triangular** distributions for task durations - Defined using `min`, `mode`, `max` - **Exception**: machine tasks assumed to have fixed durations - Mean = 1/3 × (`min` + `mode` + `max`) - Parameters estimated from staff interviews + standard operating procedure documents </div> --- ## Example: Machine batch jobs (1/2) - Processing machines in our model take a long time and are typically run overnight (except for urgent specimens) - Processing step works at the **block level** - Machine has a capacity of 300 regular or 36 mega-sized blocks - **Batching policy**: do not separate blocks from the same specimen - **Hold policy for batches**: non-urgent batches are only started at the end of day, collected by staff the next morning --- ## Example: Machine batch jobs (2/2) ![image](https://hackmd.io/_uploads/BJaGZhpaC.png) --- ## <img src="https://hackmd.io/_uploads/SkElv3R6R.png" alt="drawing" width="150"/> simulation outputs - Excel .xlsm file (Default) - Outputs a large number of statistics by default (queue statistics, resource statistics, counter statistics, etc.) - Requires macros 🤨 - Simple text file - Outputs a smaller set of statistics in text form - I/O blocks - Streams custom output to a file **during** the simulation run itself - CSV and free formats supported - Most versatile and portable output format --- ## **P**rocess **AN**alyzer (PAN) - Auxiliary program bundled with Arena - Can run a series of related simulations (change the model file, input values, etc.) - We use this to observe the effect of changing a single variable on the system performance ![image](https://hackmd.io/_uploads/B1hARhApR.png) --- ### Example: Effect of adding a single staff member in different roles <div class="small"> Only adding staff to microtomy leads to statisically significant change in turnaround time (decrease) </span> <img src="https://hackmd.io/_uploads/ryFenyRa0.png" alt="drawing" width="700"/> --- ## From <img src="https://hackmd.io/_uploads/SkElv3R6R.png" alt="drawing" width="150"/> to Python <img src="https://hackmd.io/_uploads/H1B2mpAp0.png" alt="drawing" width="40"/> - Arena is hard to integrate into a workflow - Models (`.doe`) are compiled into binary (`.p`) via an intermediate text format (`.mod`, `.exp`), but Excel-read values are hard-baked - Command-line tools (compiler / runtime) are poorly (or not at all) documented - Alternative — open-source simulation libraries - We chose <img src="https://hackmd.io/_uploads/rka1qaaT0.png" alt="drawing" width="100"/> which is written in Python - Based on **coroutines** which interact with the **event loop** using the `greenlet` library - Each coroutine corresponds to a process --- ## An example process in Python (`salabim`) ```python import salabim as sim class Customer(sim.Component): def process(self): self.request(clerks) self.hold(30) self.release() # not really required ``` - By default, all new `Customer` instances enter the `process` function automatically. - `clerks` is a `Resource` instance; units of this instance can be requested or released - Delays (e.g. job processing times) are represented using `hold()`, which accepts both constants and `Distribution` instances --- ## Defining some process building blocks in Python ![image](https://hackmd.io/_uploads/S1JgMRjAC.png) --- ## Defining tasks using method injection - The `Process` class relates to tasks that do actual work on the specimens/blocks/slides - Tasks are diverse, thus we need the ability to define custom processes - Use **method injection** to associate a task's Python function to the matching `Specimen`/`Block`/`Slide`/`Batch` class ```python # class Process(BaseProcess) def setup(self, in_type: Type, fn: Callable[[Component], None]) -> None: """Set up the component, called immediately after initialisation.""" super().setup() self.in_type = in_type setattr(self.in_type, self.name(), fn) ``` --- ## Revisiting task duration distributions - In the Arena model, we used the triangular distribution in most cases - Python makes it easier to define new distributions not in the existing libraries - We implemented the **PERT** distribution, which concentrates more probability mass around the mean than the triangular distribution <img src="https://hackmd.io/_uploads/SyktYAjRA.png" alt="drawing" width="800"/> --- ## Automatic statistics collection with `salabim` Monitors - Some `salabim` elements such as `Resources` have built-in `Monitor` objects - We can also add our own - For `Resource`, automatic monitors include the current total/in-use units and the size of the queue - Can return mean/variance/etc. or the full table of values over time (`pandas` export) - Two types of `Monitor` - **Level:** track a value over time — $x(t_0)$, $x(t_1)$, $x(t_2)$, ... - **Non-level:** track a series of values — $x_0$, $x_1$, $x_2$, ... - Affects how averages are calculated (time-weighted vs. regular mean) --- ## Integration with Building Information Modeling - We model the time taken to move specimen (batches) between rooms in the histopathology lab - Distances are extracted from a geometric model of the lab - `ifcopenshell` library (**IFC** = Industry Foundation Classes) - Walls and doors converted into shapes and overlaid on grid - Travel on grid permitted in 8 directions (as in King chess piece) - Additional path definitions for inter-floor travel (lift, stairs) - Moretti *et al.*, <https://ssrn.com/abstract=4827727> --- ## BIM model of histopathology lab building <span class=small>Digital scanning (Stage 8) room is on a different floor than main lab</span> <img src="https://hackmd.io/_uploads/ByuqJy0a0.png" alt="drawing" width="50%"/> --- ## Grid overlay in Python `shapely` library <span class=small>Red grid squares show doors `d1` to `d16`</span> <img src="https://hackmd.io/_uploads/HJ-_x1RT0.png" alt="drawing" width="50%"/> --- ### Graph and heatmap showing path segments between doors in the lab <span class=small>Highlighted squares in right figure denote lift travel</span> <img src="https://hackmd.io/_uploads/ryYX-10aA.png" alt="drawing" width="50%"/> <img src="https://hackmd.io/_uploads/Sk7U-106R.png" alt="drawing" width="40%"/> Final step: map doors to process stages and compute shortest paths between stages --- ## Scenario comparison <img src="https://hackmd.io/_uploads/ryt0MyCa0.png" alt="drawing" width="40%"/> <span class=small>A small change in total travel time (avg. 70 additional seconds, caused by lift breakdown) causes a sigificant change in lab performance (proportion of specimens completed in 10 days)</span> --- ## Advantages of Python model - Only open-source software was used — reduce potential costs for healthcare administrators/analysts - "[Shoestring](https://www.digitalshoestring.net/shoestrings-first-hospital-pilot-begins/)" paradigm — integration of existing free/low-cost technologies to deliver new tech solutions <hr/> ## Drawbacks of Python model - Python simulation model is missing some features from the Arena model (timed gates, bootstrapping) - Implementable, but much more work than in Arena - Harder to iterate/improve model than using visual tool - (Can potentially try existing open-source visual simulation tools, e.g. [JaamSim](https://jaamsim.com/) and [Warteschlangensimulator](https://a-herzog.github.io/Warteschlangensimulator/) — both Java-based) --- ## Some Python libraries used <div class="two-column-layout"> <p> - `ifcopenshell`: Building information modelling - `shapely`: Geometric representation of histopathology lab - `networkx`: Path-segement representation of histopathology lab, shortest-paths computation - `openpyxl`: parse Excel configuration files - `pandas`: dataframes </p> <p> - `pydantic`: input validation - `salabim`: Discrete event simulation - `matplotlib/plotly`: plotting - `jupyter`: Python notebooks - `dash`: web UI - `pymongo`: database - `rq`: job queue </p> </div> --- ## Future work - Data integration - Bootstrap the initial simulation state - Find out data about planned disruptions (e.g. lift maintenance schedule) - Obtain staff rota - Asset management (e.g. processing machine cannot run if chemical stores empty) - Model iteration/refinements and validation --- # Thank you! ### Any questions?