0% helpful
0 likes 0 dislikes
Inputs Required
Golden set; metrics definitions; baseline approach; acceptance thresholds; test harness.
Output Artifact
Eval plan; scorecards; ship/no-ship thresholds; comparison results.
When to use this
When you must choose a model/prompt approach and need a repeatable, defensible way to compare options.
Link copied