The problem is almost never the model
When a computer-vision system underperforms, the fix is usually in the data — not the architecture.
A detector I worked on kept flagging garage doors as radiators.
We could have spent a week on the model — tuning thresholds, swapping architectures, arguing about hyperparameters. It's the instinct most teams reach for first, because it feels like the "real" machine-learning work.
The actual fix took an afternoon: the training set had barely seen garage doors. We added them as hard negatives — examples that look like the target but aren't — and the false positives dropped.
Why this keeps happening
A model only knows what it has seen. When it fails, it's usually telling you one of three things:
- The data is wrong or thin — missing the cases that matter, or labeled inconsistently.
- The evaluation is lying to you — your metric doesn't measure what you actually care about.
- The task was never clearly defined — "find radiators" hides a dozen edge cases nobody agreed on.
In my experience, 90% of "the model is bad" turns out to be one of those three. The model is the most visible part of the system, so it gets blamed first — but it's rarely where the leverage is.
A more useful order of operations
When something underperforms, I look in this order:
- Are the failing cases even in the training data?
- Does the eval set contain them, and does the metric punish the failure?
- Is the task definition specific enough that two people would label it the same way?
- Then — and only then — touch the model.
What I learned
Treat data, evaluation, and task definition as first-class engineering, not as setup work you rush through to get to the "real" model. The model is downstream of all three. Fix the upstream, and the model usually takes care of itself.