AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

The SituationSymptoms in ProductionThe CostThe DecisionTreating Length as a System PropertySetting a Concrete TargetThe ExecutionRewriting the PromptAdding the BackstopHandling the Long Source ProblemThe OutcomeMeasurable ImprovementKnock-On BenefitsThe LessonsVague Words Produce Vague LengthsVerification Is What Made It ReliableWhat the Team Would Do DifferentlyDefine the Contract on Day OneInstrument Before OptimizingTreat the Source Length Problem EarlierFrequently Asked QuestionsWhat was the team's core mistake before the fix?Why did they regenerate instead of truncate over-length summaries?How did they handle very long source articles?What result did the changes produce?What is the most transferable lesson?Key Takeaways
Home/Blog/How a Digest Team Tamed Runaway AI Summaries
General

How a Digest Team Tamed Runaway AI Summaries

A

Agency Script Editorial

Editorial Team

·December 14, 2021·6 min read
output length control strategiesoutput length control strategies case studyoutput length control strategies guideprompt engineering

This is the story of one team that had a length problem bad enough to threaten a product, and how they fixed it. The details are composited from the kinds of situations that recur in this work, but the arc is faithful to how these problems actually get solved: a frustrating status quo, a decision to treat length as an engineering concern, a sequence of changes, and a measurable result.

The team ran a daily digest product. Every morning it sent subscribers a set of AI-generated summaries of articles in their field. The summaries were the product. And the summaries kept coming out the wrong length, sometimes a single thin sentence, sometimes a sprawling three paragraphs that defeated the whole point of a digest.

What follows is the situation they faced, the decision they made, how they executed it, what it produced, and the lessons that generalize. Read it as a worked example of turning length from a daily annoyance into a controlled property.

The Situation

The digest had grown popular, and the length problem had grown with it.

Symptoms in Production

Subscribers complained that some summaries were too long to skim and others were too thin to be useful. The team's prompt asked for "a concise summary," which the model interpreted differently for every article. Length variance was high and unpredictable, and it was the top theme in support tickets.

The Cost

Inconsistent length was eroding the digest's core promise: a fast, scannable read. Churn was creeping up, and the team traced a meaningful share of it to the reading experience rather than the content quality. Length had become a retention problem, not just an aesthetic one.

The Decision

The turning point was reframing length as something to engineer, not request.

Treating Length as a System Property

Instead of tweaking the word "concise" yet again, the team decided to control length structurally and verify it. This meant defining a target, encoding it where they could, and checking every summary before it shipped. The reframing is the same one at the heart of Opinionated Rules for Keeping AI Output the Right Size.

Setting a Concrete Target

They replaced "concise" with a concrete contract: each summary should be two to three sentences, never more than sixty words, and must include the single most important fact from the article. A range and a ceiling, not an exact count, matched how the model actually behaved.

The Execution

They rolled the changes out in stages rather than all at once.

Rewriting the Prompt

First they rewrote the prompt around the contract: "Summarize in two to three sentences, under sixty words, leading with the single most important fact." The structural limit and the must-keep element replaced the vague "concise," and variance dropped immediately on the easy articles.

Adding the Backstop

For the harder articles that still overshot, they added a length check. Anything over the ceiling went back to the model with "compress this to under sixty words while keeping the lead fact." Regenerating to compress, rather than truncating, preserved the lead fact that truncation would have risked cutting. This enforcement step mirrors the process in Dialing In AI Response Length, One Step at a Time.

Handling the Long Source Problem

Very long source articles had been producing the most bloated summaries, because the model tried to cover too much. The team added a preparation step that extracted the key points first, then summarized those, which kept the final output focused regardless of source length.

The Outcome

The changes produced results the team could measure.

Measurable Improvement

Length variance collapsed. The share of summaries falling inside the target band went from inconsistent to nearly all of them, and support tickets about summary length dropped to a trickle. The digest felt consistent in a way it never had, and the reading experience stopped being a source of complaints.

Knock-On Benefits

Consistent length made the digest layout cleaner and the product easier to skim, which the team believed contributed to the churn trend flattening. Engineering length had paid off well beyond the original complaint, touching retention and design at once.

The Lessons

A few takeaways generalized beyond this one product.

Vague Words Produce Vague Lengths

"Concise" was the root cause. It felt like an instruction but carried no enforceable meaning. Replacing it with a concrete contract, a range, a ceiling, and a must-keep element, was the change that mattered most. The dangers of vague instructions are catalogued in Seven Ways People Lose Control of AI Output Length.

Verification Is What Made It Reliable

The prompt rewrite helped, but it was the length check and the compress-on-overshoot step that made length dependable rather than merely improved. Treating length as a property to verify, not just request, was the durable lesson the team carried into other features.

What the Team Would Do Differently

Hindsight surfaced a few things the team wished they had done sooner, and these are often the most useful part of any case.

Define the Contract on Day One

The team spent months tweaking the word "concise" before reframing length as an engineered property. In retrospect, writing a concrete contract, a range, a ceiling, a must-keep element, should have been the very first move, not the eventual breakthrough. The lesson is that vagueness is not a starting point to refine but a problem to replace outright.

Instrument Before Optimizing

They had no clean measure of length variance until late, which meant early changes were judged by gut feel. Had they tracked the share of summaries inside the target band from the start, they would have known immediately which changes helped and which were noise. Measurement should have preceded the fixes, not trailed them.

Treat the Source Length Problem Earlier

The bloated summaries from very long articles were a known pain that the team tolerated for too long before adding the extraction step. Recognizing that the worst failures clustered around long sources, and addressing that root cause sooner, would have saved a stretch of inconsistent output that subscribers noticed.

Frequently Asked Questions

What was the team's core mistake before the fix?

Relying on the vague word "concise," which the model interpreted differently for every article. It felt like a length instruction but carried no enforceable meaning, so length variance stayed high until they replaced it with a concrete contract.

Why did they regenerate instead of truncate over-length summaries?

Because truncation risked cutting the lead fact that each summary was required to include. Regenerating with a compress instruction shortened the text while preserving the must-keep element, which a blind truncation could have dropped.

How did they handle very long source articles?

They added a preparation step that extracted the key points first, then summarized those. Long sources had produced the most bloated summaries because the model tried to cover everything, and extracting first kept the final output focused.

What result did the changes produce?

Length variance collapsed, nearly all summaries landed inside the target band, and support tickets about length dropped to a trickle. The team also saw cleaner layout and a flattening churn trend, suggesting the fix touched retention as well.

What is the most transferable lesson?

Treat length as a property to engineer and verify, not a word to request. The concrete contract plus a verification backstop made length reliable, and that reframing transferred to other features the team built afterward.

Key Takeaways

  • Vague length words like concise produce high, unpredictable variance because the model interprets them differently each time.
  • Replacing vague instructions with a concrete contract, a range, a ceiling, and a must-keep element, was the highest-impact change.
  • A verification backstop that regenerates over-length output to compress made length dependable rather than just improved.
  • Regenerating to compress preserved the must-keep fact that truncation would have risked cutting.
  • Extracting key points from very long sources before summarizing kept the final output focused.
  • Treating length as an engineered, verified property transferred well beyond the original feature.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification