ParaEval
Regression Tests
Each synthetic case defines expected outputs. At build time, the decision engine runs live against each case and the results are compared to expectations.
All 3 golden cases pass
Golden Cases
| Case | Expected | Actual | Result |
|---|---|---|---|
Hong Kong Cold-Chain Warehouse — Typhoon Saola hong-kong-cold-storage-saola | Trigger Met 70% | Trigger Met 70% | Pass |
Manila Coastal Logistics Facility — Typhoon Karding manila-coastal-logistics | Trigger Met 83% | Trigger Met 83% | Pass |
Jakarta Residential Asset — January 2020 Floods jakarta-residential-asset | Not Met 17% | Not Met 17% | Pass |
Illustrative Failure Example
fixtureAlgorithm drift: confidence threshold change
If the met threshold were changed from 0.70 to 0.75, the Hong Kong flagship case would shift from met to borderline with no change to evidence. This illustrates why the algorithm is pinned and golden case expectations are checked at build time.
-status: "met" (expected)
-confidence: 0.700
+status: "borderline" (actual — threshold drift)
+confidence: 0.700