AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

The Situation: Stalled and Drowning in DocumentsWhy it stalledThe Decision: Writing the Problem DownHow the paragraph decided thingsThe Execution: Rockier Than PlannedThe wrong turn and the recoveryWhy the model reflex is so commonAdding the Missing Piece: ObservabilityWhat observability changedHow the blind period actually hurtThe Outcome: Measurable and RealThe numbers that matteredThe Lessons, in HindsightWhat they would do differentlyFrequently Asked QuestionsWhy did writing the problem down break the stall?Why did switching to a more powerful model not help?What did observability actually change?Was the outcome really measurable?Is this a real company?What was the team's single biggest regret?Key Takeaways
Home/Blog/One Team's Path Through an AI Stack Decision
General

One Team's Path Through an AI Stack Decision

A

Agency Script Editorial

Editorial Team

·November 12, 2017·8 min read
choosing an AI tech stackchoosing an AI tech stack case studychoosing an AI tech stack guideai tools

A case study is more useful than a checklist because it shows the decisions in motion, including the wrong turns. Real stack choices are not made in the clean order a guide implies; they are made under pressure, with incomplete information, and revised when reality pushes back. Watching a team work through that mess teaches things a tidy framework cannot.

This is the story of one team, a mid-sized operations group, choosing an AI tech stack for a document-processing tool. The team and specifics are an illustrative composite rather than a named account, but the arc is the real shape these projects take: a stalled situation, a forced decision, a rocky execution, a measurable result, and lessons that only became clear in hindsight. Follow the arc and the abstract advice from other articles becomes concrete.

The Situation: Stalled and Drowning in Documents

The operations team processed thousands of supplier documents by hand each month, extracting a handful of fields from each. It was slow, error-prone, and the volume was growing faster than the team. Someone proposed an AI tool, and the project promptly stalled because nobody knew how to choose the stack.

Why it stalled

  • Every team member had a different favorite model from something they had read.
  • Nobody had written down what the tool actually needed to do.
  • The conversation kept circling tools instead of the problem.

The stall was not a tooling problem. It was the absence of a starting point, the exact gap that a clear problem statement closes.

The Decision: Writing the Problem Down

The breakthrough came from refusing to discuss tools for one meeting and instead writing the problem in a single paragraph. The task was field extraction from semi-structured documents. The accuracy bar was high because errors flowed into payments. The volume was a few thousand documents a day. The budget was modest.

How the paragraph decided things

Once written, the paragraph did most of the selecting. The accuracy bar and moderate volume pointed to a capable hosted model rather than the cheapest option or a self-hosted build. The need to extract from the documents' own content pointed to including their text directly rather than relying on general knowledge. The discipline mirrored Step by Step Through an AI Tech Stack Decision.

The Execution: Rockier Than Planned

The first build was a hosted model, the document text fed in directly, and simple code as glue. It worked on clean documents and fell apart on messy ones, which were the majority. The team's instinct was to switch to a more powerful model.

The wrong turn and the recovery

Switching models barely helped, because the problem was not model capability. It was that messy documents needed better input handling before the model ever saw them. The team added a preprocessing step to normalize the document text, and accuracy jumped. The lesson, that input quality often beats model capability, echoes the data-layer priority in Everything That Goes Into an AI Tech Stack Decision.

Why the model reflex is so common

The team's first instinct to reach for a bigger model is worth examining, because almost everyone has it. When an AI system underperforms, the model is the most visible component and the easiest to swap, so it becomes the default suspect. But the model is rarely where the problem lives. More often the issue is upstream, in what the model is being fed, or downstream, in how the output is being checked. The team here lost time chasing the visible suspect before examining the actual culprit. The habit worth building is to ask, before touching the model, whether the input was clean and the question well-posed, because fixing those is usually cheaper and more effective than upgrading the engine.

Adding the Missing Piece: Observability

The early version had no way to see why a given extraction failed. When an error reached a payment, the team could not trace it. They added logging of each document, the prompt, and the extracted result, plus a daily sample review.

What observability changed

  • Failures became diagnosable from evidence instead of guesswork.
  • The sample review caught a category of silent errors the team had not known existed.
  • Trust in the tool grew because problems were visible and fixable.

The team later said skipping observability at the start was their biggest regret, because the early blind period cost them weeks of confusion.

How the blind period actually hurt

It is worth being concrete about what the missing observability cost. During the early weeks, a supplier complained that a payment was wrong. The team wanted to know which document produced the bad extraction and why, but they had logged nothing, so they could not reconstruct what the system had seen. They ended up re-running documents by hand to reproduce the error, a process that took days and shook confidence in the tool just as it was gaining adoption. Once logging existed, the same class of complaint became a five-minute lookup: pull the document, see the prompt and the extracted result, and identify the cause. The contrast between days of guesswork and a five-minute lookup is the entire argument for building observability before you need it.

The Outcome: Measurable and Real

After the preprocessing fix and observability addition, the tool processed the daily volume with an error rate below the manual baseline, and the team that had been drowning was reassigned to higher-value work. The cost per document, tracked from soon after launch, stayed within the modest budget.

The numbers that mattered

  • Processing time per document dropped dramatically against the manual baseline.
  • The error rate fell below what humans had achieved by hand.
  • Cost per document stayed within budget at full volume.

These were not hypothetical projections. They were measured, which was only possible because observability had finally been built in. A claim of success you can put a number behind is worth far more than a vague sense that the tool is helping, both for the team's own confidence and for justifying the project to the people who funded it.

The Lessons, in Hindsight

The team distilled their experience into a few hard-won lessons. Write the problem before discussing tools. When output is bad, suspect the input before the model. Build observability from the start, not after the first painful incident. And let measured results, not intuition, tell you whether the stack is working.

What they would do differently

  • Start with observability rather than adding it under duress.
  • Resist the reflex to fix quality by upgrading the model.
  • Trust the written problem statement to settle tool debates faster.

The arc from stalled to shipped came down to discipline, not cleverness. The tools were never the hard part. The hard part was the order of decisions and the willingness to suspect the right layer when things broke.

Frequently Asked Questions

Why did writing the problem down break the stall?

Because the stall came from debating tools with no shared target. A written problem statement gave the team a fixed specification to measure options against, which settled debates that intuition kept reopening.

Why did switching to a more powerful model not help?

Because the failures came from messy input the model never had a fair chance with, not from a lack of model capability. Fixing the input through preprocessing solved what a bigger model could not.

What did observability actually change?

It turned untraceable failures into diagnosable ones and surfaced a category of silent errors the team had not known existed. It also made the measurable outcomes possible, since you cannot measure what you do not log.

Was the outcome really measurable?

Yes, because logging and cost tracking were in place. Processing time, error rate against the manual baseline, and cost per document were all measured rather than estimated.

Is this a real company?

It is an illustrative composite drawn from common patterns, not a named account. The arc and the lessons reflect what these projects genuinely look like.

What was the team's single biggest regret?

Skipping observability at the start. The early blind period cost them weeks of confusion that thorough logging would have prevented, which is why they would build it in first next time.

Key Takeaways

  • The project stalled on the absence of a problem statement, not on a tooling gap.
  • Writing the problem in one paragraph let it settle tool debates and select the stack.
  • When output was poor, the fix was better input handling, not a more powerful model.
  • Skipping observability created a blind period the team called their biggest regret.
  • Measurable outcomes in time, error rate, and cost were only possible because logging existed.
  • The decisive factor was the order of decisions and suspecting the right layer, not cleverness.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification