This is the part most teams feel too late.
1. Cost
AI usage is commonly billed by token count.
OpenAI states that token usage is tracked in categories such as input, output, cached, and reasoning tokens, and that pricing varies by token usage.
Anthropic’s current API pricing also uses per-million input and output tokens, which shows the same pattern across major providers.
So if your product sends:
- a long system prompt
- repeated instructions
- full chat history every turn
- large retrieved chunks no one actually needs
You are not just sending text. You are sending cost.
2. Speed
OpenAI says completion latency is mostly influenced by two factors: the model and the number of tokens generated. It also notes that shorter responses are returned faster.
That means token waste does not only inflate the bill.
It also slows the user experience.
3. Output quality
Context windows are measured in tokens, not pages or messages.
OpenAI notes that each model has a maximum combined token limit for input and output.
Anthropic now offers a 1M-token context window in beta for Claude Sonnet 4.6, which shows how far context limits have grown.
But larger windows do not remove the need for discipline. They just raise the ceiling.