MANDATORY READING for every agent (implementer, evaluator, designer):
/Users/sebastian/dev/deepspace/docs/design-guidelines.md
That document is the authoritative aesthetic contract: what the site looks like, what's forbidden, how the typography scale works, where cards are allowed, the CTA budget, the subtraction principle. Every fix dispatched must trace to a rule in either design-guidelines.md or this file. If a brief seems to contradict design-guidelines.md, the guidelines win.
This document defines how Claude runs the autonomous UI/UX improvement loop for the TracePlot site. It is the single source of truth for the rubric, the roles, the workflow, and the stop conditions. If you are an agent reading this: your task definition in the orchestrator prompt is authoritative for what to do, but how you report must follow the contract here so the orchestrator can fold your output into the rubric without reinterpretation.
1. Orchestration principles
- Claude orchestrates, agents execute. The orchestrator does not read source files directly, does not run the dev server or build scripts directly, does not edit code directly, does not run the browser or score pages directly. Every concrete action is delegated to a subagent with a self-contained prompt. The orchestrator's job is to decide what to do next, brief the right agent, and merge the result into the rubric.
- Evaluation is continuous, not one-shot. Every iteration begins with a fresh evaluation pass. The rubric is never assumed stable between iterations — implementing one fix often moves another score, and regressions must surface the same turn they appear.
- Full surface, not just the landing page. Every page the site ships — landing,
/start,/reserved,/methodology,/docs, every/docs/[slug],/legal/privacy,/legal/terms,/legal/imprint— is scored. Launch readiness is a property of the whole site, not one hero. - Every fix must be justified by an evaluator finding. The orchestrator never dispatches a fix it cannot point to a rubric finding for. Taste-based changes that aren't traceable to a scored dimension are out of scope.
- Images carry no text. All textual UI — labels, badges, headlines, sidebars, pins, controls — is rendered in HTML/JSX as an overlay. Images are the photographic or illustrative layer only.
- The dev server is assumed to be running on
http://localhost:3000and hot-reloads on file writes. Agents do not start, stop, or restart it. - Stop when the rubric is satisfied AND no obvious wins remain. Do not pad the loop. Do not invent issues to keep going.
2. The rubric
Ten dimensions, each scored 0–5. The target is every dimension ≥ 4, at every breakpoint, on every page. Stretch goal is ≥ 4.5 average.
| # | Dimension | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|---|
| 1 | Typography & hierarchy | broken, unreadable | readable, no hierarchy | some hierarchy, inconsistent scale | clear hierarchy, consistent scale | refined pairing, rhythm, tracking, line-height intentional | editorial-grade — typography itself is part of the brand |
| 2 | Layout & spacing | broken layout | content visible but misaligned | aligned, inconsistent spacing | consistent spacing scale, no dead zones | intentional whitespace, visual balance, asymmetry used purposefully | nothing to add or remove |
| 3 | Responsive (375 / 768 / 1024 / 1440) | breaks on mobile | works but looks bad on one size | works on all sizes, one feels like an afterthought | all sizes work well | each breakpoint feels native, not squished | flawless across the entire range |
| 4 | Header / nav / footer chrome | broken | aligned, unpolished | symmetric, plain | deliberate styling, consistent | hover/focus states, subtle details | memorable without being distracting |
| 5 | Hero & section framing | missing | text + image, awkward | clear headline, decent visual | headline + visual + CTA in conversation | all three pull together, no wasted space | can't imagine it better |
| 6 | Imagery (simple, text-free, HTML overlay on top) | none or stock | generic | on-brand but flat | intentional, matches palette, no baked-in text | distinctive, supports narrative | editorial-grade brand work |
| 7 | Component polish (buttons, cards, pills, badges, tables, forms, accordions) | default browser | styled but inconsistent | consistent styling | consistent + refined (shadows, radii, hover) | micro-details (focus rings, transitions, active states) | design-system feel |
| 8 | Copy integrity (polish only, no meaning rewrites) | broken, typos | clear but bland | clear and on-brand | scannable, each section earns its space | voice consistent, CTAs compelling | every word pulls weight |
| 9 | Accessibility (contrast, focus, semantics, alt, keyboard, ARIA) | broken | several issues | roughly AA | WCAG AA clean | AA+ with keyboard and screen-reader care | AAA where reasonable |
| 10 | Technical polish (console clean, no CLS, images optimized, meta tags, OG image, favicon) | broken | works | clean console, loads fast | images optimized, LCP good | LCP < 2s, CLS < 0.1, no layout shifts, full meta | Lighthouse 95+ across the board |
Dimension-specific anti-patterns (automatic score caps)
Dim 2 — Card-everywhere anti-pattern. A region wrapped in border + bg-surface + rounded-[22px] (or any equivalent "white box on the page background" treatment) must earn its container role. A box is only justified if the content inside is a genuine container:
- A pricing plan (one of several compared offers) → card OK.
- A comparison row stacked on mobile → card OK.
- A diagnostic "Is this you?" sub-question → card OK.
- Everything else — page heroes, legal/docs hero headers, pill-row groups, TOC asides, article bodies, section titles, publisher info, contact info, "what happens next" lists — does NOT earn a card. Use flat type on the page background + whitespace + optional thin rules for separation.
If a page has more than 2 unrelated card layers stacked on top of each other (excluding legitimate container cards above), dim 2 is capped at 2 for that page until the nesting is removed. Editorial design pattern: page background does the work; type and whitespace create hierarchy; boxes are the exception, not the rule.
Dim 5 — Editorial hero rule. Pages whose hero is just eyebrow + h1 + subhead + (optional) small CTA (no dashboard, no form, no mock) render the hero as flat type directly on the page. No card wrapper. Applies to: /start, /reserved, /methodology, /legal/*, /docs, /docs/[slug]. Dim 5 is capped at 3 on any such page whose hero is card-wrapped.
Dim 7 — Approved library primitives. Only these libraries are in use for interactive/a11y-critical UI:
@radix-ui/react-accordion— FAQ accordion.@radix-ui/react-dialog— mobile nav drawer, any modal.- (Others may be added only via user approval.)
Everything else (buttons, pills, cards-that-survive, badges, tables, form layouts) remains hand-rolled Tailwind. No shadcn/ui adoption, no Headless UI, no Aceternity. Implementers that introduce other libraries must report it and will be rejected by default.
Implementer brief contract
Every implementer dispatch must carry:
- The rubric finding(s) being addressed, with file:line citations from an evaluator report.
- A numbered list of atomic fixes — each fix is one discrete edit with an expected before/after.
- A "do not touch" guardrail list — files the implementer may read but not edit.
- Out-of-scope rule: any issue the implementer notices outside its brief is reported, not fixed.
- Typecheck + lint requirement: implementer runs
pnpm typecheckandpnpm linton completion and reports pass/fail. - Tab hygiene: implementers that open a browser tab follow §3a.
Document mode vs Marketing mode
Every page belongs to one of two stylistic modes. The mode determines how aggressive the editorial framing can be. Mixing modes (applying marketing chrome to a document page) is an anti-pattern and caps several dimensions.
Marketing mode pages:
/(landing)/start/reserved/docs(index)
Marketing mode allows: display serif headlines at large scales, small-caps eyebrow labels, hero composition with visual + CTA, crafted section openers, distinctive treatments per section, imagery, and all the usual brand vocabulary. These pages are selling — they should feel distinctive and intentional.
Document mode pages:
/legal/privacy/legal/terms/legal/imprint/docs/[slug](any)/methodology
Document mode is deliberately plain. The reader is there for information, and decoration competes with content. Document mode pages must:
- Use ONE font family for all text. Body sans-serif throughout. A modest serif h1 is acceptable (at most 1.8–2rem), but h2/h3/h4 must match the body font family — no display serif at huge sizes.
- Have a narrow heading scale: h1 ≈ 1.75–2rem, h2 ≈ 1.15–1.3rem (bolder weight), h3 ≈ 1rem (bolder), h4 ≈ body size (italic or semibold).
- NOT use eyebrow pills / small-caps section labels above the title.
- NOT use small-caps metadata treatment for dates, authors, or "last updated" lines. Plain body text.
- NOT have a sidebar / aside TOC. No "On this page" navigation panel. Readers scroll.
- NOT have a mobile
<details>TOC accordion. Same reason. - NOT have a hero composition (eyebrow + display headline + muted subhead + thin rule). Just:
<h1>Title</h1>, optional date line, first body paragraph. - NOT wrap tables in surface cards or scroll-hint gradients. Tables get plain borders and cell padding, full stop.
- NOT wrap the article body in any card — just a centered single column with a comfortable
max-w-[68ch]measure and sufficient vertical padding. - NOT insert decorative separators between sections. Heading size and vertical rhythm do the separation.
The test: if the page looks like something you'd print and read, it's document mode. If it looks like something designed for a website, it's leaked marketing-mode styling.
A user quote that anchors this rule: "It does not look like a clean document. Full of unnecessary stuff to make it 'beautiful' but just adds noise... The 'On this page' totally unnecessary for terms, I never see that in sites. This is the type of overkill that might feel 'cool' but it's unnecessary."
Dim 1 (typography) and dim 2 (layout) are capped at 3 on any document-mode page that exhibits marketing-mode chrome — eyebrow, display-sized h1, sidebar TOC, mobile details TOC, small-caps metadata, decorated tables, card-wrapped hero. All of them together drag the score to 2.
Scoring rules
- Each page gets its own 10-dimension score at each of four breakpoints: 375px (mobile), 768px (tablet), 1024px (small desktop), 1440px (large desktop).
- Dimension 3 (responsive) is scored once per page — it is about the relationship between breakpoints, not a per-breakpoint property.
- For dimensions that are global (e.g. brand consistency, component polish on shared components), regressions on one page drag the score for all pages that share the component.
- A dimension is blocking if any breakpoint on any page scores < 4. Launch is not ready until no blockers remain.
What each score means in practice
- 0 — user would bounce immediately.
- 1 — visibly wrong, harms credibility.
- 2 — "fine" but nothing earns its place. Amateur.
- 3 — clean and professional, doesn't stand out.
- 4 — launch quality. Noticeable care. This is the minimum floor.
- 5 — best-in-class. Each dimension at 5 is a stretch, not required.
3. Pages in scope
| Path | Purpose | Notes |
|---|---|---|
/ | Landing / marketing | Primary surface. Hero, steps, comparison, pricing, FAQ, final CTA. |
/start | Onboarding entry | Post-click destination from most CTAs. |
/reserved | Confirmation page | Post-reservation success state. |
/methodology | Product-depth content | Technical trust page. |
/docs | Docs index | Lists doc entries from /docs/*.md. |
/docs/[slug] | Individual doc | Markdown render. At least one representative slug is evaluated per round. |
/legal/privacy | Privacy policy | Legal chrome. |
/legal/terms | Terms | Legal chrome. |
/legal/imprint | Imprint | Legal chrome. |
Every round evaluates all of these. Subpages are not second-class citizens.
3a. Browser tab hygiene
Agents open Chrome tabs to evaluate pages. Left unchecked the browser fills with dozens of stale tabs over a few rounds. Hygiene rules:
- Every agent opens at most one tab. Navigate that single tab between URLs and breakpoints — do not create a fresh tab per page. Use
mcp__claude-in-chrome__tabs_create_mcponce at the start, then reuse vianavigate. - Agents record their tab ID in their report. Last line of every browser-using agent's report:
Tab opened by this agent: <tabId>(ornoneif no tab was opened). - Agents close their own tab before returning whenever possible — via
mcp__claude-in-chrome__tabs_close_mcpor awindow.close()throughjavascript_tool(for tabs they created). If they can't close, they still report the tab ID so the orchestrator can. - Orchestrator sweeps at round boundaries. After every round (not between agents within a round), the orchestrator calls
tabs_context_mcp, compares against the tab IDs reported by agents, and closes any tab that:- was reported by a returned agent, or
- points to a
localhost:3000page and is not being used by any still-running agent, or - is leftover from a previous round.
One working tab may be kept on
about:blankfor the next round's first dispatch.
- Never close a tab while a background agent that may own it is still running. If in doubt, wait for the agent to return.
- Orchestrator does not share tab IDs between agents. Each agent creates its own, reports it, and the orchestrator tracks ownership in a simple map
{agentId: tabId}.
4. Breakpoints
| Width | Label | Device class |
|---|---|---|
| 375px | mobile | iPhone 13 / 14 / 15 mini baseline |
| 768px | tablet | iPad portrait |
| 1024px | laptop | small desktop, common content-max |
| 1440px | desktop | standard MBP 14" and above |
Each evaluator call screenshots all four per page. The Chrome automation tool supports resize_window for this.
5. Agent roster
Each role is a prompt template, not a long-lived process. Every call is a fresh subagent with a self-contained brief.
5.1 Evaluator agent (subagent_type: general-purpose)
Purpose: score a page against the rubric at all four breakpoints, return a structured report.
Inputs from orchestrator:
- Target URL(s) on
http://localhost:3000. - The rubric (by reference to this document).
- Tab ID if known; otherwise instructions to create one.
- What changed since last eval (so the agent can focus on regressions).
What the agent does:
- Loads the URL in Chrome via MCP tools.
- Resizes to each of 375 / 768 / 1024 / 1440 and takes a screenshot at each.
- Zooms into header, hero, any card regions that look off, using the
zoomaction. - Scores each dimension per breakpoint.
- Writes findings as a structured report.
Required output format (the orchestrator merges this into the running rubric, so the shape matters):
## Page: <path>
### Scores
| Dim | Mobile | Tablet | Laptop | Desktop | Notes |
|---|---|---|---|---|---|
| 1 Typography | 3 | 4 | 4 | 4 | headline clamp too aggressive on 375 |
| 2 Layout | ... | | | | |
| ... | | | | | |
### Blocking findings (score < 4)
1. **[dim 2, mobile, hero]** Launching-autumn pill renders as empty oval next to mock — cause likely `flex justify-end` + pill-as-sibling. File: `app/_components/landing-page.tsx:54-57`.
2. ...
### Non-blocking observations
- ...
### Regression check
- Fix X from previous round: resolved ✓ / partially / regressed.
Constraints:
- The agent never edits files.
- Findings must cite a file and line when possible. Vague findings ("the hero feels cramped") are rejected — the orchestrator will re-dispatch asking for specifics.
- Each finding names the dimension, breakpoint, and suspected cause.
5.2 Explorer agent (subagent_type: Explore)
Purpose: read-only code discovery when the orchestrator needs to understand where something lives before dispatching a fix.
Typical questions:
- "Where is the header component defined and what components does it render?"
- "How is the hero mock composed in
landing-page.tsx?" - "Which components import
ProductMock?"
Output format: compact — file paths, line numbers, and a two-sentence summary. No code dumps unless the orchestrator asks for a snippet.
5.3 Implementer agent (subagent_type: general-purpose)
Purpose: take a specific, bounded fix list and apply it to the codebase.
Inputs from orchestrator:
- A numbered list of atomic fixes, each with the file, line, rubric dimension, finding, and intended outcome.
- Guardrails (don't touch X, don't rename Y, keep behavior of Z).
- "Do not invent new fixes. If you see something broken that's not on the list, report it but do not fix it."
What the agent does:
- Reads the affected files.
- Applies the fixes.
- Runs
pnpm typecheckandpnpm linton what it changed. - Returns a diff summary, not the diff itself.
Output format:
## Fixes applied
1. [dim 4] Header nav spacing — `components/header.tsx:23` — changed gap-4 to gap-6 and wrapped "How it works" in whitespace-nowrap. Typecheck: pass. Lint: pass.
2. ...
## Fixes skipped
- [dim 7] Card hover state — the card already has hover:shadow-card-hover but the token is defined as the same value as shadow-card. Needs token change, not component change. Reporting back.
## New issues observed (not fixed)
- ...
Constraints:
- Never changes files outside the stated fix list.
- Never commits. The orchestrator decides if/when to commit.
- If a fix is impossible as stated, returns that instead of inventing an alternative.
5.4 Image-prompt agent (subagent_type: general-purpose)
Purpose: rewrite a prompt JSON file, run the image generation script, pick the best variant, report back.
Inputs from orchestrator:
- Which prompt file to rewrite (
scripts/prompts/*.json). - The rubric finding that motivated the change.
- Explicit "no text, no labels, no brand names" constraint.
- Number of variants to generate (default 2; higher if prompt is risky).
What the agent does:
- Reads the current prompt JSON and the finding.
- Proposes a new
promptstring that strips all text, labels, UI chrome, and brand names. - Writes the new JSON to the prompt file.
- Runs
python3 scripts/generate-image.py --variants N <prompt-file>(requires deleting or renaming existing variants first, since the script skips existing files). - Reads each generated image.
- Reports which variant to use and why, or reports failure and proposes a next prompt.
Output format:
## Prompt update: <name>
- **Finding addressed:** [dim 6] <finding>
- **New prompt:** <text>
- **Variants generated:** N at <path>
- **Recommended variant:** vX — <one-line rationale>
- **Fallback:** vY if vX has <specific issue>
- **Concerns:** <anything the orchestrator should know>
5.5 Wire-image agent (subagent_type: general-purpose)
Purpose: replace a placeholder component with an <Image /> that uses the final generated image, plus an HTML overlay layer for any textual UI.
Inputs:
- Placeholder component path (e.g.
components/graphics.tsx:21 ProductMock). - Image path under
public/images/. - The HTML overlay specification — the orchestrator designs the overlay, the agent builds it.
- Alt text.
What the agent does:
- Moves the chosen image from
public/images/generated/topublic/images/with a final name. - Replaces the component's placeholder markup with
<Image />+ absolutely-positioned overlay children. - Ensures the overlay is responsive, not a fixed-pixel layer.
- Runs typecheck + lint.
5.6 Accessibility agent (subagent_type: general-purpose)
Purpose: dedicated a11y sweep — contrast, focus order, alt text, ARIA, keyboard traps, semantic HTML.
Dispatched: once per round, after visual evaluator, before implementer. Also once at the end before declaring the rubric satisfied.
Output format:
## A11y findings — <page>
| Severity | Dim 9 impact | Finding | File |
|---|---|---|---|
| blocker | -1 | input missing label | app/start/page.tsx:44 |
| ... |
5.7 Technical-polish agent (subagent_type: general-purpose)
Purpose: console errors, network waterfall, image sizes, meta tags, OG image wiring, favicon wiring, CLS check, LCP check.
Dispatched: once per round near the end. This is the dimension-10 specialist.
6. Workflow
6.1 Round anatomy
ROUND N
├── (a) Evaluator × 9 pages (parallel) ← visual scoring pass
├── (b) A11y agent × 9 pages (parallel) ← dimension 9 pass
├── (c) Technical-polish agent × 1 (site) ← dimension 10 pass
├── (d) Orchestrator merges findings → rubric.json, updated scores
├── (e) Orchestrator ranks blockers by (severity × leverage)
├── (f) Orchestrator plans next batch of fixes
│ ├── group 1: independent implementer fixes → parallel dispatch
│ ├── group 2: image prompt updates (if any) → parallel dispatch
│ └── group 3: wire-image fixes (depend on image agent output)
├── (g) Wait for all fixes to return
├── (h) Sanity check: typecheck + lint (via implementer or technical-polish)
└── GOTO ROUND N+1
6.2 Round 0 (baseline)
Round 0 is the only round with no fix dispatch. The orchestrator dispatches:
- Evaluator on all 9 pages in parallel.
- A11y on all 9 pages in parallel.
- Technical-polish once.
Then it merges into docs/launch-rubric-state.md and writes a prioritized issue list. The user can review this if they come back.
6.3 Parallelism rules
- Evaluators for independent pages are always parallel.
- Implementers can be parallel only if their file sets do not overlap. The orchestrator is responsible for tracking file ownership within a round.
- Image agents are always parallel with each other (they touch different prompt files).
- Wire-image agents must run after their corresponding image agent returns.
- If two fixes touch the same file, they are batched into one implementer call with both fixes in the list.
6.4 Regression handling
Every evaluator call includes a "regression check" section that verifies the previous round's fixes. If a fix regressed:
- The finding is promoted to top priority for round N+1.
- The orchestrator dispatches an explorer agent to understand why, before dispatching a re-fix.
6.5 Loop budget
- Iterate until excellent. The stop conditions in §8 are the gate; the orchestrator does not stop because of an arbitrary round count. If the rubric isn't satisfied after a round, dispatch the next round. Repeat until the stop conditions are all true.
- Soft warning at 8 rounds: if blockers remain after 8 rounds, the orchestrator writes a "stuck report" summarizing what keeps failing and dispatches differently (different agent, different approach, different model) rather than re-running the same brief.
- No hard monetary cap on image generation — the user has lifted it — but the orchestrator will not regenerate the same prompt more than 3 times without changing the prompt strategy. Three failures of the same prompt is a prompt-design problem, not a generator problem.
- Between rounds, the orchestrator re-reads
design-guidelines.mdandlaunch-rubric-state.mdto re-ground its judgment.
7. Rubric state file
The orchestrator maintains docs/launch-rubric-state.md as the running record. Format:
# Launch rubric — running state
**Last round:** 3
**Last updated:** 2026-04-11T14:32:00Z
**Status:** blocking findings remaining
## Summary (latest round)
| Page | Avg score | Worst dim | Worst score |
|---|---|---|---|
| / | 3.8 | 6 Imagery | 2 |
| /start | 4.1 | 4 Header | 3 |
| ... |
## Blocking findings
1. ...
## History
- Round 0: baseline avg 2.9
- Round 1: avg 3.3 (+0.4), hero regressed dim 3 mobile
- Round 2: avg 3.8 (+0.5)
- ...
This file is what the orchestrator reads at the start of each round to know where it is. It is also what the user reads to understand progress.
8. Stop conditions
The loop stops and the orchestrator declares launch-ready when all of the following hold:
- Every page × every breakpoint × every dimension ≥ 4.
- No open regressions from the previous round.
pnpm typecheckclean.pnpm lintclean.- No console errors in the Chrome tab on any of the 9 pages.
- No blockers from the accessibility agent.
- Dimension 10 checklist complete: og:image wired, favicon wired, meta tags present, images moved out of
/public/images/generated/into/public/images/with final names, Next<Image>used where applicable. - The orchestrator has exhausted the "non-blocking observations" list down to items it considers post-launch polish.
On stop, the orchestrator writes a final summary to docs/launch-rubric-state.md and posts a concise report to the user.
9. Failure modes the orchestrator must handle
- Evaluator disagreement round-to-round. If two consecutive evaluator calls score the same unchanged page 3 points differently, the rubric is noisy. The orchestrator averages and logs the drift, and tightens the rubric prompt.
- Image regeneration flakiness. If the Gemini call fails, retry once. If it fails twice, log it and move to the next fix.
- Implementer over-reach. If an implementer returns changes to files outside the brief, the orchestrator rejects the round and re-dispatches with tighter guardrails.
- Hot-reload desync. If the evaluator sees old output after a fix, the orchestrator navigates the tab to the URL fresh rather than assuming reload.
- Stuck round. If the same blocker appears in 3 consecutive rounds, promote it to user attention rather than retrying.
8a. Model selection for agent dispatch
The orchestrator picks the cheapest model that can do the job well:
- Sonnet — structured instruction-following work with a clear checklist:
- Evaluator agents (scoring against the rubric)
- Implementer agents with atomic fix lists
- Verifier agents (running typecheck/lint, reading files, reporting back)
- Any agent whose brief reads as "do X, then Y, then Z, then report in this format"
- Opus — work that requires architectural judgment, design taste, or merging conflicting signals:
- Writing the rubric state merge
- Writing prompt rewrites for image generation
- Deciding whether to strip a card vs. restyle it (genuine design calls)
- Resolving cross-agent conflicts
- Any dispatch where "use your judgment" is a significant part of the brief
Dispatches pass model: "sonnet" explicitly when Sonnet is right; otherwise the agent definition's default is used.
9a. Task state discipline
Task status must always reflect reality, never intent. Rules:
pending→in_progressis set at the moment the actual work starts — i.e. the same tool call that dispatches the agent / runs the command / begins the edit. Not before. Never markin_progressas a way of "queueing" future work.in_progress→completedis set only after the work has returned a verifiable result — agent returned its report, command exited, edit was saved. Not when I think it's done, not when I've just dispatched, not optimistically.- Dispatch ≠ done. Dispatching an agent is when the task enters
in_progress. The task does not becomecompleteduntil the agent returns and I have merged its output. - If I am about to mark multiple tasks
in_progressin the same turn, that is a signal I am batching intent, not tracking work. Only the tasks whose work I am starting in that turn getin_progress. The rest staypending. - No premature
completed. A task that I consider "trivially satisfied" (e.g. a tab sweep that turned out to be empty) still gets the two-step: verify → then update. - When a task is abandoned or superseded, use
deletedwith a note — never falsely mark itcompleted.
This discipline exists so the user can trust the task list as a source of truth about what is actually happening in the repo at any moment.
10. What the orchestrator never does
- Read or edit source files directly.
- Run the dev server, build, typecheck, lint, test, or image-generation scripts directly.
- Take browser screenshots or score pages directly.
- Dispatch a fix that isn't traceable to a rubric finding.
- Commit or push.
- Rewrite product copy for meaning (only for polish).
- Change brand tokens (palette, radii, font stack) without explicit user input.
- Touch pricing numbers, deposit amount, or legal body text.