# EDA for QC *Trust requires traceability* James Joseph <!--- ---> --- `EDA` [Exploratory data analysis (EDA)](https://www.ibm.com/think/topics/exploratory-data-analysis) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. <!--- “Don’t expect standard summaries to reveal the unusual.” ---> --- `QC Levels` * L1 = self-review * L2 = peer-reviewed * L3 = double-coded, compared *QC level is not enforced, only planned <!-- --> --- `The Problem We Solve` > “We're finding same issues, again?!” * QC logs scattered, usually Excel * Task tracking is inconsistent * Verification is hard to trace * Issues repeat across cycles <!--- --> --- `What We've Learned` > Actually Quadruple Programming! * Excel doesn't scale * Traceability is non-negotiable * Double-Independent QC = Biased. * Roles must be enforced <!-- --> --- `EDA Solution` [Event-driven architecture](https://www.ibm.com/think/topics/event-driven-architecture) is a software design model built around the publication, capture, processing and storage of events. <!-- remember this definition --> --- `Where EDA Fits In` Extract → **Transform** *Between raw input and formal output.* → Load --- `Where EDA Fits In` **QC Plan** → QC Log + Deliverables → Live Tracker → **Archive** <!-- --> --- `Who Is EDA For` **Reviewers** * Biostatisticians verifying deliverables * Programming managers * QC leads and reviewers <!-- --> --- `Our Terminology` * **Study** = container * **QC** **Plan** = qc levels + assignments * **Deliverable** = grouped files * **File** = assigned unit * **Artifact** = required file --- ![image](https://hackmd.io/_uploads/HJF23fvQge.png) <!-- --> --- `Enforcing QC Level` * **L1** = self-review•, * **L2** = peer-reviewed * **L3** = double-coded, compared **NOTE: not optional**; --- `Our Roles` * **Manager**: assign, compare, approve * **Developer**: implement * **Lead** = Developer + Manager * **Reviewer**: approve * **Admin**: assign roles in system <!--- --> --- `Developer View of Tracker` ![image](https://hackmd.io/_uploads/ByAOafPmex.png) <!-- --> --- `How QC Levels Are Enforced` Role boundaries protect Level 3 QC integrity: * Developers can’t view other developers’ files. * Developers cannot run compares on their own outputs. * Developers do not share the same input files. <!-- --> --- `Triggers` * File assigned by manager, e.g. upload/update qc plan → * Event = logged activity --- `IssueLogged` ![image](https://hackmd.io/_uploads/SJaPdMw7ee.png) --- `StatusChanged` ![image](https://hackmd.io/_uploads/HJLj17Pmxx.png) <!--- ---> --- `Notification of Event` ![image](https://hackmd.io/_uploads/ryKlQlDXee.png) --- `Tracker` QC Plan in Action * NDJSON = full event log * Reconciler builds tracker from log * Every change is derived * No manual edits allowed <!-- --> --- `Types of events` Setup & Submission Events Issue Management Events System-Derived Events --- `Setup & Submission Events` | Event | Purpose | Tracker | Notify? | |---------------|--------------------------|------------------|---------| | QCPlanUploaded| Start plan | File view shown | No | | FileSubmitted | Submit code or compare | Artifact marked | Yes | | DevCommented | Add dev note | Note shown | No | --- `FileSubmitted` ![image](https://hackmd.io/_uploads/HkGHfQwQlg.png) <!-- --> --- `FileSubmitted` ![image](https://hackmd.io/_uploads/HkoP_7D7xx.png) --- `Issue Management Events` | Event | Purpose | Tracker | Notify? | |--------------|---------------------|------------------|---------| | IssueLogged | Block a file | File = blocked | Yes | | IssueResolved| Reopen workflow | File status reevaluated | Yes | --- `IssueLogged` ![image](https://hackmd.io/_uploads/SJ04ZgwQgx.png) <!-- --> --- `System-Derived Events` | Event | Trigger | Tracker | Notify? | |--------------------|--------------------------|------------------|---------| | StatusChanged | Artifact state check | Status changes | Yes | | StageCompleted | All stage files complete | Status changes | Yes | | DeliverableComplete| All stages passed | Status changes | Yes | <!-- --> --- `UAT` The CAMIS Project * Assigned devs to build R + Py tests * Same input, independent code * Compare after both uploaded * Compare result logs QC pass/fail <!--- ---> --- `Execute` ![image](https://hackmd.io/_uploads/HJ9YGQPXxg.png) --- `Submit` ![image](https://hackmd.io/_uploads/HkXTHlDmxl.png) --- `Seed` ![image](https://hackmd.io/_uploads/HJzal7vXll.png) --- `Compare` ![image](https://hackmd.io/_uploads/H1CREBPQel.png) --- `Roadmap: Validation` - [x] SOPs + user docs * Verify test modules (t-test, ANOVA) * Output audit trail across tools * Regulatory-ready (SOC2, Part 11) <!-- --> <!-- --> --- `Integration Strategy` * Microsoft Enterprise ID * Azure container * API- first (integrate with other tools) * MS Graph API for SharePoint, Teams, GitHub <!-- --> --- `Our Differentiators` Real double programming * Append-only event log automates tracker * Role-based views to maintain independence * Compare engine across R, Py, SAS * Cloud-native, Microsoft-integrated <!-- --> --- `Thank You` TP - SOP & Work Instructions JJ – Product & Architecture FR – Frontend + UI XH – System + Reconciliation --- *`Let’s talk QC`* cal.com/eda --- `Demo` - [x] [Viewer Demo](cal.com/eda) - [x] [API Demo](cal.com/eda)
{"title":"EDA for QC","image":"https://hackmd.io/_uploads/ByYGrdv7xe.png","description":"Clear oversight for statistical deliverables.","contributors":"[{\"id\":\"6911e540-3a20-4834-b9cd-5fa027a43ec9\",\"add\":11451,\"del\":5593}]"}
    157 views