To keep this easy to reason about, think in four steps. We’ll use one running example: churn prediction for a subscription product.

1. Set the goal (metric + constraints)
Key idea: pick a metric that matches business cost.
Churn example: missing churners might be expensive, so you may value recall more than raw accuracy.
2. Prepare data (splits + leakage checks)
Key idea: evaluation must mirror reality.
Churn example: use time-based splits so the model predicts the future, not memorizes the past.
3. Search (models + hyperparameters + pipeline variants)
Key idea: AutoML explores combinations you won’t manually test.
Churn example: it may find a tree-based model with tuning settings that outperforms your default baseline.
4. Validate and pick (performance + cost + latency)
Key idea: “best” is multi-dimensional.
Churn example: a model that’s 1% better but 10× slower might not fit your product latency budget.
This is where human in the loop matters most. You’re not only picking a score. You’re picking a model you can ship and maintain.