# EDA for QC
*Trust requires traceability*
James Joseph
<!---
--->
---
`EDA`
[Exploratory data analysis (EDA)](https://www.ibm.com/think/topics/exploratory-data-analysis) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods.
<!---
“Don’t expect standard summaries to reveal the unusual.”
--->
---
`QC Levels`
* L1 = self-review
* L2 = peer-reviewed
* L3 = double-coded, compared
*QC level is not enforced, only planned
<!--
-->
---
`The Problem We Solve`
> “We're finding same issues, again?!”
* QC logs scattered, usually Excel
* Task tracking is inconsistent
* Verification is hard to trace
* Issues repeat across cycles
<!---
-->
---
`What We've Learned`
> Actually Quadruple Programming!
* Excel doesn't scale
* Traceability is non-negotiable
* Double-Independent QC = Biased.
* Roles must be enforced
<!--
-->
---
`EDA Solution`
[Event-driven architecture](https://www.ibm.com/think/topics/event-driven-architecture) is a software design model built around the publication, capture, processing and storage of events.
<!--
remember this definition
-->
---
`Where EDA Fits In`
Extract → **Transform**
*Between raw input and formal output.*
→ Load
---
`Where EDA Fits In`
**QC Plan** → QC Log + Deliverables
→ Live Tracker → **Archive**
<!--
-->
---
`Who Is EDA For`
**Reviewers**
* Biostatisticians verifying deliverables
* Programming managers
* QC leads and reviewers
<!--
-->
---
`Our Terminology`
* **Study** = container
* **QC** **Plan** = qc levels + assignments
* **Deliverable** = grouped files
* **File** = assigned unit
* **Artifact** = required file
---

<!--
-->
---
`Enforcing QC Level`
* **L1** = self-review•,
* **L2** = peer-reviewed
* **L3** = double-coded, compared
**NOTE: not optional**;
---
`Our Roles`
* **Manager**: assign, compare, approve
* **Developer**: implement
* **Lead** = Developer + Manager
* **Reviewer**: approve
* **Admin**: assign roles in system
<!---
-->
---
`Developer View of Tracker`

<!-- -->
---
`How QC Levels Are Enforced`
Role boundaries protect Level 3 QC integrity:
* Developers can’t view other developers’ files.
* Developers cannot run compares on their own outputs.
* Developers do not share the same input files.
<!--
-->
---
`Triggers`
* File assigned by manager, e.g. upload/update qc plan →
* Event = logged activity
---
`IssueLogged`

---
`StatusChanged`

<!---
--->
---
`Notification of Event`

---
`Tracker`
QC Plan in Action
* NDJSON = full event log
* Reconciler builds tracker from log
* Every change is derived
* No manual edits allowed
<!--
-->
---
`Types of events`
Setup & Submission Events
Issue Management Events
System-Derived Events
---
`Setup & Submission Events`
| Event | Purpose | Tracker | Notify? |
|---------------|--------------------------|------------------|---------|
| QCPlanUploaded| Start plan | File view shown | No |
| FileSubmitted | Submit code or compare | Artifact marked | Yes |
| DevCommented | Add dev note | Note shown | No |
---
`FileSubmitted`

<!--
-->
---
`FileSubmitted`

---
`Issue Management Events`
| Event | Purpose | Tracker | Notify? |
|--------------|---------------------|------------------|---------|
| IssueLogged | Block a file | File = blocked | Yes |
| IssueResolved| Reopen workflow | File status reevaluated | Yes |
---
`IssueLogged`

<!--
-->
---
`System-Derived Events`
| Event | Trigger | Tracker | Notify? |
|--------------------|--------------------------|------------------|---------|
| StatusChanged | Artifact state check | Status changes | Yes |
| StageCompleted | All stage files complete | Status changes | Yes |
| DeliverableComplete| All stages passed | Status changes | Yes |
<!--
-->
---
`UAT`
The CAMIS Project
* Assigned devs to build R + Py tests
* Same input, independent code
* Compare after both uploaded
* Compare result logs QC pass/fail
<!---
--->
---
`Execute`

---
`Submit`

---
`Seed`

---
`Compare`

---
`Roadmap: Validation`
- [x] SOPs + user docs
* Verify test modules (t-test, ANOVA)
* Output audit trail across tools
* Regulatory-ready (SOC2, Part 11)
<!--
-->
<!--
-->
---
`Integration Strategy`
* Microsoft Enterprise ID
* Azure container
* API- first (integrate with other tools)
* MS Graph API for SharePoint, Teams, GitHub
<!--
-->
---
`Our Differentiators`
Real double programming
* Append-only event log automates tracker
* Role-based views to maintain independence
* Compare engine across R, Py, SAS
* Cloud-native, Microsoft-integrated
<!--
-->
---
`Thank You`
TP - SOP & Work Instructions
JJ – Product & Architecture
FR – Frontend + UI
XH – System + Reconciliation
---
*`Let’s talk QC`*
cal.com/eda
---
`Demo`
- [x] [Viewer Demo](cal.com/eda)
- [x] [API Demo](cal.com/eda)
{"title":"EDA for QC","image":"https://hackmd.io/_uploads/ByYGrdv7xe.png","description":"Clear oversight for statistical deliverables.","contributors":"[{\"id\":\"6911e540-3a20-4834-b9cd-5fa027a43ec9\",\"add\":11451,\"del\":5593}]"}