# Why Focal Loss Replaced Hard Negative Mining (HNM)
## πΉ 1. HNM is Discrete & Heuristic
* **SSD approach**: after computing losses, sort negatives and pick the top *k* (usually 3Γ positives).
* **Hard cutoff**: some negatives are kept, others dropped β purely by rank.
* Issues:
* β Non-differentiable (sorting step).
* β Requires a hyperparameter (3:1 ratio).
* β Potentially unstable if dataset distribution shifts.
## πΉ 2. Focal Loss is Continuous & Adaptive
* Formula:
$$
FL(p,y) = -(1-p)^\gamma \log(p)
$$
where *p* = predicted probability.
* Behavior:
* β
Easy negatives (p β 0) β weight β 0.
* β
Hard negatives (p β 0.5) β retain strong weight.
* Works per-example, **no sorting required**.
## πΉ 3. Scalability to Many Classes
* **HNM**: fixed ratio per image β tricky with 80+ classes (e.g., COCO).
* **Focal Loss**: adjusts contribution **automatically per sample**. No hand-tuned ratios needed.
## πΉ 4. Efficiency
* **HNM**:
* Compute all losses.
* Sort negatives.
* Select top *k*.
* β οΈ Sorting = expensive & training bottleneck.
* **Focal Loss**:
* Just apply $(1-p)^\gamma$.
* β
Lightweight, GPU-friendly.
## πΉ 5. Stability & Performance
* Fully differentiable β smoother optimization.
* Empirical results:
* RetinaNet + focal loss **beats SSD + HNM** on AP, recall, and imbalance robustness.
## β
Bottom Line
* **HNM (SSD)** = *manual*: "throw away most negatives, keep a few hard ones."
* **Focal Loss (RetinaNet)** = *automatic*: "keep all, but weight smartly & smoothly."
π **Focal Loss is now preferred** because itβs elegant, efficient, differentiable, and scales better to complex datasets.