AGENCYSCRIPT
CoursesEnterpriseBlog
đź‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Play 1: Scope the DecisionStepsOwnerPlay 2: Define the TaxonomyStepsOwnerPlay 3: Build the Gold SetStepsOwnerPlay 4: Draft and Iterate the PromptStepsOwnerPlay 5: Validate Per Class and Per GroupStepsOwnerPlay 6: Ship With the Right Automation LevelStepsOwnerPlay 7: Monitor and Re-RunStepsOwnerPlay 8: Handle the Failure DrillsStepsOwnerSequencing the Plays TogetherThe first-build pathReturning to plays as triggers fireA Worked Example of the SequenceHow the plays unfoldWhere it pays offFrequently Asked QuestionsWhat is the most common play teams skip?Who should own the overall capability?How small should the taxonomy be?When do I re-run the validation plays?Can I run these plays out of order?Key Takeaways
Home/Blog/Sequencing Emotion Detection From First Prompt to Production
General

Sequencing Emotion Detection From First Prompt to Production

A

Agency Script Editorial

Editorial Team

·August 8, 2021·7 min read
prompting for sentiment and emotion detectionprompting for sentiment and emotion detection playbookprompting for sentiment and emotion detection guideprompt engineering

A playbook is different from a tutorial. A tutorial teaches you how a thing works; a playbook tells you what to do, when to do it, and who owns it. This is the operating manual for building a sentiment and emotion detection capability — a sequence of named plays you can run, each with a clear trigger and an owner, so the work moves from idea to reliable production system without stalling in the middle.

The plays are sequenced deliberately. Skipping ahead — building a fancy multi-label classifier before you have agreed on what the labels mean — is the most common way these projects collapse. Run them in order the first time, then return to specific plays as triggers fire.

Each play below names what sets it off, the steps, and who should hold it.

Play 1: Scope the Decision

Trigger: someone proposes using emotion detection for something.

Steps

Before any prompting, define what decision the output will drive and how wrong the model is allowed to be. An aggregate trend dashboard tolerates far more error than a system that escalates distressed customers. The stakes determine every later choice.

Owner

The person accountable for the decision the output feeds — usually a product or CX lead, not the prompt author. This prevents building a precise classifier for a problem that did not need one, or a sloppy one for a problem that did.

Play 2: Define the Taxonomy

Trigger: scope is agreed and the project is greenlit.

Steps

Write the exact set of labels with one or two example messages per label that define the boundary. Keep it small. Decide whether you need polarity, discrete emotions, dimensional scores, or aspect-level output based on the decision from Play 1.

Owner

A single taxonomy owner who will maintain it as edge cases surface. This artifact becomes the contract for everything downstream, exactly as described in Rolling Out Prompting for Sentiment and Emotion Detection Across a Team.

Play 3: Build the Gold Set

Trigger: taxonomy is defined.

Steps

Hand-label a few hundred representative examples against the taxonomy, deliberately including hard cases — sarcasm, mixed sentiment, domain idioms. Record where annotators disagree. This set is how you will measure everything; build it before the prompt, not after.

Owner

Whoever owns quality. The gold set is the project's source of truth and should not be owned by the same person racing to ship the prompt.

Play 4: Draft and Iterate the Prompt

Trigger: gold set exists.

Steps

Write a constrained prompt with a fixed output format, two or three domain few-shot examples, and an explicit uncertain path. Measure against the gold set, read the errors, and iterate. Use a reasoning step for hard cases. The advanced techniques you reach for here are in When Sarcasm Breaks Your Emotion Classifier, Try This.

Owner

The prompt author, working against the gold set rather than vibes.

Play 5: Validate Per Class and Per Group

Trigger: the prompt produces stable output.

Steps

Compute precision and recall per emotion class, not just overall accuracy, and where possible disaggregate by the populations your text represents. Fix systematic weaknesses before shipping. This guards against the fairness failures detailed in The Hidden Risks of Prompting for Sentiment and Emotion Detection (and How to Manage Them).

Owner

The quality owner, who signs off that the classifier meets the bar set in Play 1.

Play 6: Ship With the Right Automation Level

Trigger: validation passes.

Steps

Match automation to stakes. Automate aggregate analytics and the clearest individual cases; route uncertain and high-stakes calls to humans. Wrap the classifier in the team's existing workflow rather than a new ceremony. The repeatable process scaffolding is in Building a Repeatable Workflow for Prompting for Sentiment and Emotion Detection.

Owner

The product or operations owner who controls the workflow it plugs into.

Play 7: Monitor and Re-Run

Trigger: the system is live and on a recurring schedule.

Steps

Re-run against a fresh labeled sample on a cadence, watch label distributions for sudden shifts, and run a calibration session if the team grows or definitions drift. Feed resolved edge cases back into the taxonomy and gold set.

Owner

The taxonomy and quality owners jointly. Without a scheduled trigger, monitoring quietly stops happening.

Play 8: Handle the Failure Drills

Trigger: the quality gate fails, accuracy decays, or a stakeholder disputes a result.

Steps

Have a predefined response rather than improvising under pressure. When the gate fails, pause the batch, pull the misclassified examples, and determine whether the cause is input drift, a model change, or a prompt regression. Roll back to the last known-good prompt version while you diagnose. When a stakeholder disputes a label, trace it through the structured output and evidence span rather than relitigating from memory.

Owner

The quality owner runs the drill; the decision owner is informed if the failure affects live decisions. A rehearsed failure path is what separates a mature capability from one that panics when something breaks.

Sequencing the Plays Together

The plays are not independent — they form a chain where each one's output is the next one's input.

The first-build path

For an initial build, run plays one through six in order, then stand up seven and eight as standing capabilities. Skipping ahead — most commonly jumping to play four before plays two and three exist — is the single most reliable way to end up with a classifier nobody can measure or trust.

Returning to plays as triggers fire

Once live, you re-enter specific plays when their triggers fire: a new use case sends you back to play one, a model upgrade sends you to play five, and a disputed result sends you to play eight. Treating the playbook as a set of triggered routines rather than a one-time checklist is what keeps the capability healthy as it ages.

A Worked Example of the Sequence

To make the sequence concrete, consider a support team that wants to flag angry tickets for faster handling.

How the plays unfold

Play one scopes the decision: angry tickets jump the queue, so a false negative — missing real anger — is worse than a false positive. That stakes assessment says favor recall on the anger class. Play two defines a small taxonomy with anger sharply distinguished from mere frustration. Play three builds a gold set heavy on the boundary cases between the two. Play four writes a prompt that reasons about tone before labeling, and play five validates recall on anger specifically rather than overall accuracy.

Where it pays off

By play six, the team automates escalation only for high-confidence anger and routes uncertain cases to a human, matching automation to the stakes set in play one. Play seven catches the day a product launch floods the queue with a new vocabulary the prompt has not seen, and play eight rolls back cleanly when the gate flags the drop. The sequence is what made each of those steps a deliberate choice rather than a scramble.

Frequently Asked Questions

What is the most common play teams skip?

Defining the taxonomy and building the gold set before writing the prompt. Teams rush to a clever prompt and then have no way to measure whether it works, so they ship on intuition and discover the problems in production.

Who should own the overall capability?

Ownership is split deliberately: the decision owner sets stakes, the taxonomy owner maintains definitions, and the quality owner guards the gold set and validation. One person wearing all three hats tends to cut corners on whichever conflicts with shipping.

How small should the taxonomy be?

Small enough that the team and the model apply it consistently — usually a handful of well-defined categories. Granularity beyond that erodes agreement between annotators and the model, which defeats the purpose.

When do I re-run the validation plays?

On a fixed schedule, and whenever you change the prompt, swap the model, or notice a shift in label distributions. Treat any of those as a trigger to re-run Play 5 against a fresh sample.

Can I run these plays out of order?

For a first build, no — each play depends on the previous one's output. Once the capability is live, you revisit individual plays as their triggers fire, but the initial sequence should run in order.

Key Takeaways

  • A playbook assigns triggers and owners to each step so the build does not stall midway.
  • Scope the decision and its stakes first; they determine every later choice.
  • Define the taxonomy and build the gold set before writing the prompt, not after.
  • Validate per class and per group, then match automation level to the stakes.
  • Monitoring and re-running need a scheduled trigger and a named owner or they silently lapse.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification