The Routing Drift

The routing drift failure hiding in plain sight ... and how to fix it.

When Your AI Can't Tell Your Director Coach from Your VP Coach

You're building an AI-powered product coaching platform. You've got deep, thoughtfully designed skills — one that guides PMs through the Director transition, another that takes Directors through the VP and CPO leap. Each one is detailed, nuanced, and battle-tested with real users.

But you've got a problem you probably can't see yet.

Here are the actual routing descriptions from two real skills in Dean Peters' Product Manager Skills library:

director-readiness-advisor

Guide the PM-to-Director transition across preparing, interviewing,
landing, and recalibrating. Use when leadership scope is changing
and you need practical coaching.

vp-cpo-readiness-advisor

Guide the transition to VP or CPO across preparing, interviewing,
landing, and recalibrating. Use when executive product scope is
changing fast.

Now ask yourself: when a user types "Help me recalibrate in my product leadership role" — which skill should fire?

The model doesn't know. Neither do you, until you measure it.

The Anatomy of Routing Drift

These descriptions didn't end up similar by accident. They ended up similar because they were written the way a thoughtful engineer writes code: with consistency, shared vocabulary, and reused structure. The four phases (preparing, interviewing, landing, recalibrating) appear verbatim in both — a strong signal that one was templated from the other.

That's good documentation practice. It's terrible routing practice.

Routing descriptions are not documentation. They're instructions to the model about when to call a skill. Every word that appears in both descriptions weakens the signal on both. This pair doesn't just share words — it shares sentence structure, trigger conditions, and the four-phase framing that makes both descriptions feel interchangeable to a model that's never met your users.

The compounding problem: many real prompts contain only partial level signals rather than explicit titles. Nobody types "I am a Director seeking VP coaching." They type "I have a VP offer I'm evaluating" or "my exec peer relationships aren't landing." With descriptions this similar, even those partial signals get lost — the model has no meaningful basis to prefer one description over the other. Sharpen the descriptions, and the partial signals can do their job. For prompts with no level signal at all, a well-designed skill asks a single clarifying question — which is exactly what both of these skills already do.

This is routing drift: overlap that accumulates quietly, one reasonably-written skill at a time, until a meaningful fraction of your routing decisions are wrong.

Measuring It

Homingo is a CLI tool built specifically for this problem. Point it at your skills directory and run:

homingo lint --pair director-readiness-advisor,vp-cpo-readiness-advisor

Here's what it finds:

Linting pair: director-readiness-advisor  vp-cpo-readiness-advisor
Threshold: 90% | Model: claude-sonnet-4-20250514 | Sim: claude-haiku-4-5-20251001 (auto) | Prompts/pair: 10

 director-readiness-advisor  vp-cpo-readiness-advisor  FAIL (50%, need 90%)

Results: 0 passed, 1 failed
Homingo lint output showing the pair failing at 50% accuracy with HIGH severity
50% accuracy — a coin flip on every ambiguous prompt.

50% accuracy — a coin flip on every ambiguous prompt. The severity badge isn't cosmetic: at this accuracy level, more than half of naturally-worded prompts route to the wrong skill.

Homingo then generates coordinated rewrite suggestions for both failing skills simultaneously, identifying the missing boundary as scope of responsibility — Director means single business unit, team leadership, first-time management; VP/CPO means multi-unit portfolio, P&L, board.

Rewrite suggestion cards showing before and suggested descriptions for both skills
Before and suggested descriptions for both skills. The suggested rewrites introduce explicit 'Does NOT handle' statements that were absent from both originals.

The suggested rewrites introduce something the originals lacked entirely: explicit Does NOT handle statements. Negative bounds are the most reliable disambiguation tool available to skill authors, and they're almost never written in first-draft descriptions.

These are suggestions only — no files have been modified. That's intentional: Homingo surfaces the problem and proposes a fix, but you review before anything changes.

Fixing It

--fix applies the rewrites to your SKILL.md files, re-tests routing accuracy against the same adversarial prompt set, and iterates until the pair clears the threshold:

homingo lint --pair director-readiness-advisor,vp-cpo-readiness-advisor --fix
Applied Rewrites report showing the evolution of both skill descriptions across two iterations, from Original through Iteration 1 to Final Applied
Two iterations to reach 100% accuracy. Each step shows the reasoning that drove the rewrite — from 'organizational scope' to the sharper 'management scope' framing.

Two things worth noticing in that output:

Iteration 1 landed at 70% — better, but not enough. Homingo escalated, sharpening its framing from organizational scope to management scope. The second pass draws a crisper line: Director means managing people for the first time; VP/CPO means managing other managers. That single reframe pushed accuracy from 70% to 100%.

Both final descriptions contain explicit Does NOT handle statements. The originals had none. That addition is consistently the highest-leverage move in disambiguating overlapping skills — one sentence that permanently removes an entire class of misrouting.

What This Means If You're Building with Skills

The root cause here wasn't carelessness. The original descriptions were well-written. They communicated clearly what each skill does. The problem is that routing descriptions aren't evaluated in isolation — they're evaluated against every other skill in your fleet simultaneously. A description that reads as perfectly clear on its own can create genuine ambiguity when a sibling skill uses the same vocabulary.

A few principles that follow:

  • Template reuse is a drift accelerator. If you scaffold new skills from existing ones without deliberately differentiating the routing descriptions, you're systematically accumulating overlap with every skill you ship.
  • Say what you don't do. Anti-patterns are underused and high-value. "Does NOT handle: VP/CPO transitions" removes an entire prompt class from the wrong skill's candidate set. One line, permanent effect.
  • Drift is a fleet property, not a skill property. You cannot catch this by reviewing individual descriptions. You need to evaluate all pairs and measure accuracy — which means you need tooling, not a code review.
  • Drift is also continuous. Every new skill potentially conflicts with existing ones. homingo lint belongs in your CI pipeline, not just as a one-time audit.

What Comes Next: Scope Overload

Fixing the routing conflict exposed a second finding. After the rewrites, Homingo flagged both updated descriptions as scope-overloaded — each now covers too many distinct intents to be reliably called for the right sub-task.

Scope Overload analysis showing both skills flagged with 4 distinct clauses each, and proposed sub-skill breakdowns
After fixing the routing conflict, Homingo identified a second-order problem: both descriptions cover too many intents. Each is proposed to shard into 3 focused sub-skills plus an orchestrator.

For director-readiness-advisor, the proposed split is:

  • director-readiness-advisor-team-management — hiring, 1:1s, performance reviews
  • director-readiness-advisor-stakeholder-relations — cross-functional trust, senior presentations
  • director-readiness-advisor-leadership-development — imposter syndrome, delegation, mindset

The routing conflict and the scope overload are related problems with the same root cause: descriptions that were written for human readers, not for models making routing decisions. Fixing one reveals the other.


homingo scan is free — no API calls, runs locally in seconds — and will show you where your highest-overlap pairs are. From there, homingo lint gives you LLM-measured accuracy, and --fix handles the rewrites.

npm install -g homingo
homingo init
homingo scan