Evaluation Framework (Offline + Online) | Product Management Framework

0% helpful

0 likes 0 dislikes

Inputs Required

Clear task definition + “correct” criteria; golden set (and rubric); baseline model/prompt; metrics (quality + latency + cost); acceptance thresholds; segmentation (easy/hard cases, user types, edge cases); online instrumentation plan.

Output Artifact

Evaluation plan + scorecards: offline results by segment, online experiment plan (A/B or shadow), ship/no-ship thresholds, and regression cadence.

When to use this

When you must compare models/prompt approaches and need a defensible way to decide what ships.

Link copied