This is the story of one team that had a length problem bad enough to threaten a product, and how they fixed it. The details are composited from the kinds of situations that recur in this work, but the arc is faithful to how these problems actually get solved: a frustrating status quo, a decision to treat length as an engineering concern, a sequence of changes, and a measurable result.
The team ran a daily digest product. Every morning it sent subscribers a set of AI-generated summaries of articles in their field. The summaries were the product. And the summaries kept coming out the wrong length, sometimes a single thin sentence, sometimes a sprawling three paragraphs that defeated the whole point of a digest.
What follows is the situation they faced, the decision they made, how they executed it, what it produced, and the lessons that generalize. Read it as a worked example of turning length from a daily annoyance into a controlled property.
The Situation
The digest had grown popular, and the length problem had grown with it.
Symptoms in Production
Subscribers complained that some summaries were too long to skim and others were too thin to be useful. The team's prompt asked for "a concise summary," which the model interpreted differently for every article. Length variance was high and unpredictable, and it was the top theme in support tickets.
The Cost
Inconsistent length was eroding the digest's core promise: a fast, scannable read. Churn was creeping up, and the team traced a meaningful share of it to the reading experience rather than the content quality. Length had become a retention problem, not just an aesthetic one.
The Decision
The turning point was reframing length as something to engineer, not request.
Treating Length as a System Property
Instead of tweaking the word "concise" yet again, the team decided to control length structurally and verify it. This meant defining a target, encoding it where they could, and checking every summary before it shipped. The reframing is the same one at the heart of Opinionated Rules for Keeping AI Output the Right Size.
Setting a Concrete Target
They replaced "concise" with a concrete contract: each summary should be two to three sentences, never more than sixty words, and must include the single most important fact from the article. A range and a ceiling, not an exact count, matched how the model actually behaved.
The Execution
They rolled the changes out in stages rather than all at once.
Rewriting the Prompt
First they rewrote the prompt around the contract: "Summarize in two to three sentences, under sixty words, leading with the single most important fact." The structural limit and the must-keep element replaced the vague "concise," and variance dropped immediately on the easy articles.
Adding the Backstop
For the harder articles that still overshot, they added a length check. Anything over the ceiling went back to the model with "compress this to under sixty words while keeping the lead fact." Regenerating to compress, rather than truncating, preserved the lead fact that truncation would have risked cutting. This enforcement step mirrors the process in Dialing In AI Response Length, One Step at a Time.
Handling the Long Source Problem
Very long source articles had been producing the most bloated summaries, because the model tried to cover too much. The team added a preparation step that extracted the key points first, then summarized those, which kept the final output focused regardless of source length.
The Outcome
The changes produced results the team could measure.
Measurable Improvement
Length variance collapsed. The share of summaries falling inside the target band went from inconsistent to nearly all of them, and support tickets about summary length dropped to a trickle. The digest felt consistent in a way it never had, and the reading experience stopped being a source of complaints.
Knock-On Benefits
Consistent length made the digest layout cleaner and the product easier to skim, which the team believed contributed to the churn trend flattening. Engineering length had paid off well beyond the original complaint, touching retention and design at once.
The Lessons
A few takeaways generalized beyond this one product.
Vague Words Produce Vague Lengths
"Concise" was the root cause. It felt like an instruction but carried no enforceable meaning. Replacing it with a concrete contract, a range, a ceiling, and a must-keep element, was the change that mattered most. The dangers of vague instructions are catalogued in Seven Ways People Lose Control of AI Output Length.
Verification Is What Made It Reliable
The prompt rewrite helped, but it was the length check and the compress-on-overshoot step that made length dependable rather than merely improved. Treating length as a property to verify, not just request, was the durable lesson the team carried into other features.
What the Team Would Do Differently
Hindsight surfaced a few things the team wished they had done sooner, and these are often the most useful part of any case.
Define the Contract on Day One
The team spent months tweaking the word "concise" before reframing length as an engineered property. In retrospect, writing a concrete contract, a range, a ceiling, a must-keep element, should have been the very first move, not the eventual breakthrough. The lesson is that vagueness is not a starting point to refine but a problem to replace outright.
Instrument Before Optimizing
They had no clean measure of length variance until late, which meant early changes were judged by gut feel. Had they tracked the share of summaries inside the target band from the start, they would have known immediately which changes helped and which were noise. Measurement should have preceded the fixes, not trailed them.
Treat the Source Length Problem Earlier
The bloated summaries from very long articles were a known pain that the team tolerated for too long before adding the extraction step. Recognizing that the worst failures clustered around long sources, and addressing that root cause sooner, would have saved a stretch of inconsistent output that subscribers noticed.
Frequently Asked Questions
What was the team's core mistake before the fix?
Relying on the vague word "concise," which the model interpreted differently for every article. It felt like a length instruction but carried no enforceable meaning, so length variance stayed high until they replaced it with a concrete contract.
Why did they regenerate instead of truncate over-length summaries?
Because truncation risked cutting the lead fact that each summary was required to include. Regenerating with a compress instruction shortened the text while preserving the must-keep element, which a blind truncation could have dropped.
How did they handle very long source articles?
They added a preparation step that extracted the key points first, then summarized those. Long sources had produced the most bloated summaries because the model tried to cover everything, and extracting first kept the final output focused.
What result did the changes produce?
Length variance collapsed, nearly all summaries landed inside the target band, and support tickets about length dropped to a trickle. The team also saw cleaner layout and a flattening churn trend, suggesting the fix touched retention as well.
What is the most transferable lesson?
Treat length as a property to engineer and verify, not a word to request. The concrete contract plus a verification backstop made length reliable, and that reframing transferred to other features the team built afterward.
Key Takeaways
- Vague length words like concise produce high, unpredictable variance because the model interprets them differently each time.
- Replacing vague instructions with a concrete contract, a range, a ceiling, and a must-keep element, was the highest-impact change.
- A verification backstop that regenerates over-length output to compress made length dependable rather than just improved.
- Regenerating to compress preserved the must-keep fact that truncation would have risked cutting.
- Extracting key points from very long sources before summarizing kept the final output focused.
- Treating length as an engineered, verified property transferred well beyond the original feature.