LISAOS // DOCS
GOVERNANCE // CLEAN CODE PIPELINE

Clean Code Pipeline Technical Specification

Contents

    1. Specification Overview
    1. System Context
    1. Functional Requirements
    1. System Architecture
    1. Repo Topology
    1. Gate Design
    1. Tool Stack
    1. Cold-Migration Methodology
    1. Governance Integration
    1. Security Posture
    1. Observability
    1. Non-Functional Requirements
    1. Acceptance Criteria
    1. Risk Register
    1. Dependencies and Assumptions
    1. Open Questions
    1. Architectural Decision Records
    1. Propagation Targets

01. SPECIFICATION OVERVIEW

System: Clean Code Pipeline (CCP)

Description: A four-gate CI/CD pipeline that enforces scoping rigour, mechanical quality, security, and test-coverage discipline on every line of code committed to Cipher Shinobi's five private repositories. Gate 0 (Fuda) enforces upfront change scoping before code is written, Gate 1 runs write-time in Claude Code, Gate 2 runs pre-push on the operator workstation, and Gate 3 runs server-side on GitHub Actions. LISA acts as sole human CODEOWNER via a 2FA-enabled GitHub identity. Governance is enforced executably: CLAUDE.md rules become Semgrep custom rules, not human diligence promises. Fuda scoping contracts are tracked as Linear issues; dispatches are logged as automatic comments via gateway webhook.

Parent Plan: ~/… (slug: clean-code-pipeline-migration)

Source Intel: CS.AK.LISA.Intel.TestingGapPluginResearch (Raiden, Dispatch 46)


Specification Scope

In Scope:

  • Four-gate pipeline architecture (pre-code Fuda scoping, write-time, pre-push, server-side) and its deterministic enforcement contract.
  • Five-repo topology: Lisa-OS (first-to-migrate, mandated by D5) plus four application repositories, governed identically and described by function (not enumerated by name) in §05.
  • Repo boundary definition for Lisa-OS — resolved in ADR-1 — including which files live inside the repo and which stay in the vault.
  • (v1.1.0) Gate 0 (Fuda scoping): upfront change contracts, dependency impact analysis via depmap.yaml, multi-agent review (yoshimitsu/raiden/gray-fox), fast-track and risk thresholds, fuda skill orchestration.
  • (v1.1.0) Linear integration: Fuda = Linear issues, automatic dispatch comments via gateway webhook, fuda_id on report_dispatch.
  • (v1.1.0) Dependency maps: depmap.yaml (YAML, source of truth) + DEPMAP.md (derived markdown) per repo, auto-refreshed on merge to main via CI.
  • Tool stack: Semgrep MCP + Semgrep CI, analyze-coverage-mcp, Stryker mutation testing (@stryker-mutator/core + @stryker-mutator/vitest-runner), CodeRabbit CLI + CodeRabbit GitHub App, GitHub Actions, pre-push hook orchestrator, dependency map generator (madge/pydeps), fuda skill.
  • Cold-migration methodology: Sprint 0 foundation, five-phase serial per-repo sprints, Sprint 1 pilot gate, broken-baseline decision rule, Sprint Final go-live.
  • CODEOWNERS model (LISA as sole reviewer via 2FA account the OS core repo) and branch-protection posture.
  • Executable governance: Semgrep custom rules authored from CLAUDE.md + CodeDisciplineProtocol + RawTwinDiscipline.
  • .clean-code-exceptions.yaml schema and its role in the authoritative no-new-code-without-tests CI gate (Q4).
  • Integration with the dispatch lifecycle, Psychic Cache discipline, and the CyberShinobi agent roster.

Out of Scope:

  • E2E test generation (Playwright, Cypress) — deferred to a separate follow-up research dispatch per D46 §7.
  • Mutation testing in steady-state — scoped post-go-live pilot only (Q1, laurigates claude-plugins skill).
  • Flaky-test detection, test-data-factory management, refactoring architecturally rotten code — out of band, surfaced as follow-up missions if baseline analysis flags them.
  • Repos outside the cipher-shinobi GitHub org.
  • Vault markdown content (Google-Drive-hosted personal and mission notes) — stays out of git per operator preference; only the Lisa-OS governance subset defined in §05 migrates into git.
  • Dropping existing tools for vendor consolidation — revisit 6 months post-go-live.
  • Replacement of the CodeRabbit GitHub App with bot-only gating — explicitly rejected by D4 (LISA as CODEOWNER is strictly better than bot-only).

Version History

VersionDateAuthorSummary of Changes
1.0.02026-04-10Genji (Dispatch 48)Initial specification. Locks D1-D5 and Q1-Q5. Resolves Lisa-OS scope via ADR-1.
1.1.02026-04-10Genji (Dispatch 61)Gate 0 (Fuda scoping), dependency maps, Linear integration, D6-D9.
1.2.02026-04-12Smoke (Dispatch 89)Replace Qodo Command CLI (discontinued) with Claude-native test generation (genji dispatches + analyze-coverage-mcp) + Stryker mutation testing. ADR-5 added.
1.3.02026-05-15Smoke (Dispatch 1318)§6.0.6 V-3 randomised-order pre-lock discipline (ADV-4 codification); §6.0.7 STOP-after-D best-available ship pattern (V-3 ladder STOP rule codification).
1.4.02026-05-18Smoke (Dispatch 1349)§6.0.6 scope clarification — --sequence.shuffle.files vs --sequence.shuffle.tests vs --sequence.shuffle shorthand semantics distinguished. V-3 closure gate codified to --sequence.shuffle.files only per D1347 genji + D1348 raiden convergent diagnosis on AK-365 (describe-block data-coupling = class (e), outside §6.0.6 a-d taxonomy).
1.5.02026-05-19Smoke (Dispatch 1392)§6.0.6 class (e) intra-file describe-block data-coupling formally promoted from informal v1.4.0 reference to full taxonomy member alongside classes (a)-(d). 4-way convergence empirical foundation cited (D1303 + D1347 + D1348 + D1387; D1387 Wave 3 n=30 V-2 26.7% [14.2-44.4%] + n=30 V-3 30.0% [16.7-47.8%] Wilson 95% CIs + 483 file-run snapshots listener_count==0). Detection antidote codified (per-describe-independent refactor OR --sequence.shuffle.tests audited gate post-refactor). AK-367 Phase 2 referenced as canonical class (e) refactor exemplar.

Reading Guide

AudienceRecommended SectionsPurpose
Tobi-san / Kage01, 02, 05, 08, 13, 14, 16Strategic overview, repo boundaries, migration flow, acceptance, risks, open issues
LISA (governance)All sections, focus on 09, 10, 17Governance integration + ADR rationale
Raiden (verification lead)06, 08, 13Gate design, per-repo migration phases, acceptance
Genji (implementation)04, 06, 07, 09Architecture, gate mechanics, tool wiring, governance hooks
Gray Fox (security + custom rules)06, 09, 10Semgrep rule scoping, security posture
Smoke (vault governance)05, 09, propagation list in §18Downstream impacts

02. SYSTEM CONTEXT

System Boundary

The Clean Code Pipeline is an enforcement layer that sits between the operator (Tobi-san + LISA and her dispatched sub-agents) and the five cipher-shinobi private repositories. It does not produce application code; it enforces quality invariants on code that other agents produce.

            +----------------+
            |   Operator     |
            | (Tobi + LISA)  |
            +-------+--------+
                    |
                    v
      +-------------+-------------+
      |  Claude Code workstation  |
      |  -------------------------|
      |  Gate 1: Semgrep MCP      |   <-- write-time advisory
      |  (post_write_code hook)   |
      +-------------+-------------+
                    |
                    v
      +-------------+-------------+
      |  git pre-push hook        |
      |  -------------------------|
      |  Gate 2a: Coverage check  |   <-- pre-push blocking
      |  Gate 2b: CodeRabbit CLI  |
      +-------------+-------------+
                    |
                    v (git push)
      +-------------+-------------+
      |  GitHub (cipher-shinobi)  |
      |  -------------------------|
      |  Gate 3: Actions workflow |
      |    - lint                 |
      |    - typecheck            |
      |    - test + coverage      |
      |    - Semgrep CI           |
      |    - no-new-code-wo-tests |
      |  + CodeRabbit App review  |   <-- server-side mandatory
      |  + CODEOWNERS review      |
      +-------------+-------------+
                    |
                    v
            main branch (protected)

Pipeline invariant: No gate is individually sufficient. Gate 0 ensures scoping rigour before code is written; Gates 1-3 enforce quality on the code itself. A bypass at any local gate (Gate 1 dismissal without Gray Fox adjudication, Gate 2 --no-verify, SKIP_GATES=1) is still caught at Gate 3. Gate 0 is structurally enforced — report_dispatch rejects without fuda_id. Gate 3 is the authoritative code-quality enforcement surface; Gates 1 and 2 exist to shorten feedback loops.


Actors

ActorRoleSurface
Tobi-sanAuthor of all PRs; operator of the Claude Code workstation; risk-threshold sign-off at Gate 0Gates 0-3
LISAGate 0 coordinator (dispatches, Linear posting, review cycle); sole human CODEOWNER via 2FA handle the OS core repoGate 0 (coordination), Gate 3 (review + merge approval)
YoshimitsuGate 0 Fuda drafter (reads depmap, fills required sections); sprint-queue monitoring + pilot retrospective + credit-burn decisionsGate 0, Sprint boundaries
RaidenGate 0 Fuda completeness/soundness reviewer; verification lead; drives each Gate 2 loop, spot-checks Gate 2a output, triages Gate 2b findingsGate 0, Gates 2a, 2b, 3
Gray FoxGate 0 Fuda security reviewer (security-tagged modules only); Semgrep custom rule author + finding adjudicatorGate 0 (security-tagged), Gate 1 (rule authoring), Gate 3 (dismissal reviews)
GenjiImplementation dispatches (fixes, test refinement, Q4 CI action authoring); receives fuda_id on all code dispatchesGates 1-3 as driven by Raiden
SmokeVault governance propagation, ArtefactMap maintenance, .clean-code-exceptions.yaml custodyCross-cutting
CodeRabbit GitHub AppAutomated server-side review bot (Pro tier)Gate 3 only

External System Dependencies

SystemOwnerIntegrationFailure ModeConstraints
GitHub (cipher-shinobi org)Tobi-san (owner: onotobi)git + Actions + Apps + APINo pushes, no CI, no review automation; local gates still fireFree plan; private repos; org-level 2FA mandatory
GitHub Actions runnersGitHubWorkflow YAML in each repoJobs fail to dispatch; pipeline degrades to pre-push + local-onlyFree-tier minute allotment per-month
Semgrep RegistrySemgrep Inc.semgrep --config autoRule downloads fail; custom rules still run offlineNo account required for public rules
Semgrep MCP pluginSemgrep Inc.Claude Code plugin via semgrep-plugin:setup_semgrep_pluginGate 1 write-time hook no-ops; falls back to Gate 2/3 catchRequires Claude Code plugin runtime
Stryker mutation testingStryker OSSnpm (@stryker-mutator/core + @stryker-mutator/vitest-runner)Mutation score validation unavailable; coverage threshold still enforced by vitest run --coverageNo account required; free open-source
CodeRabbit CLI + GitHub App (Pro tier)CodeRabbit Inc.CLI binary + GitHub App install per repoGate 2b and Gate 3 App review degrade; CI status checks still holdPro tier required for CLAUDE.md context reading
@sofia-open-source/analyze-coverage-mcpsofia-open-sourcenpx MCP server in ~/.config/claude/mcp.jsonCoverage visibility degrades; coverage threshold enforced server-side via workflow step insteadConsumes LCOV output from native test runners
npm registrynpm Inc.npm install for Stryker, MCP servers, and dev toolingInstall fails; use offline tarball fallback
PGP signing infrastructureTobi-san + LISASigning key registered on both GitHub accountsUnsigned commits rejected by branch protectionBoth accounts must have registered keys

Operational Constraints

Performance

  • Gate 1 (write-time Semgrep) must add <2 seconds per file edit; a slower hook erodes developer flow and will be bypassed.
  • Gate 2 (pre-push total) aims for <5 minutes p95 on diff-scope runs. CodeRabbit CLI iterations may cap at 7-30 minutes per iteration — this is the accepted tax for local review (see R3).
  • Gate 3 (GitHub Actions total) aims for <15 minutes p95 on a typical PR diff. Parallelise lint + typecheck + Semgrep + test where runner minutes permit.

Compliance

  • No regulatory compliance requirements — all repos are internal to Cipher Shinobi, not customer-facing.
  • Internal compliance obligations: Raw Twin Discipline, Code Discipline Protocol, FCF naming, Dispatch lifecycle discipline. Mechanically enforced via Semgrep custom rules at Gates 1 and 3.

Availability

  • The pipeline is not itself a production service. Availability requirement is "no gate is broken for more than one working day without either a repair or a documented temporary suspension in the operator runbook".
  • GitHub Actions availability flows through from GitHub itself.

Data handling

  • No PII flows through the pipeline. Source code only.
  • Secrets handling: see §10.

Cost

  • Stryker mutation testing: free open-source (npm install). Claude API costs for genji test-generation dispatches: estimated $20-40 one-shot per repo.
  • CodeRabbit Pro: existing cipher-shinobi org subscription.
  • GitHub Actions: free-tier private-repo minute allotment. Budget-aware workflow design.

03. FUNCTIONAL REQUIREMENTS

Feature Catalogue

Features are grouped by the pipeline concern they address.


3.1 Pipeline Enforcement (Deliverable 1: Three-gate enforcement surface)

FR-1.1: Write-time SAST + SCA + secrets scan

FieldContent
User StoryAs an agent writing code in Claude Code, I want Semgrep to scan every file I produce at write time so that security defects are caught before they are staged.
DescriptionEvery Write or Edit tool call on a code file triggers the Semgrep MCP post_write_code hook. The hook runs semgrep --config auto plus the Gray-Fox-authored custom rule set, in SAST + SCA + secrets modes. Findings are either auto-fixed by regeneration or escalated to LISA for Gray Fox adjudication. Silent dismissal is prohibited.
DeliverableDeliverable 1: Three-gate enforcement

Acceptance Criteria:

Given Claude Code is running with the semgrep-plugin enabled
When an agent writes a file containing a Raw Twin Discipline violation
Then the Semgrep MCP hook emits a finding
And  the agent is prompted to regenerate or escalate
And  no silent dismissal is possible

Given Semgrep MCP is not installed on the workstation
When an agent writes any code file
Then Gate 1 is a no-op
And  Gate 2b CodeRabbit CLI or Gate 3 Semgrep CI will still catch the finding on push

FR-1.2: Pre-push test coverage verification (v1.2.0 — Claude-native)

FieldContent
User StoryAs an operator about to push a diff, I want vitest run --coverage to verify that coverage has not regressed below the per-repo baseline + 5% floor, and that any new source file meets the 60% floor, so that untested code cannot reach the server.
DescriptionThe pre-push hook invokes vitest run --coverage and parses the LCOV output against the per-repo threshold from .coderabbit.yaml or an equivalent repo-level config file. On regression, the push is blocked with a clear remediation message. Test generation itself is not a pre-push concern — tests are authored by genji sub-agent dispatches during Phase B of each cold-migration sprint, guided by analyze-coverage-mcp for targeting and validated by Stryker mutation testing (>= 70% mutation score).
DeliverableDeliverable 1

Acceptance Criteria:

Given a diff that adds a source file without a corresponding test file
When the pre-push hook runs Gate 2a
Then vitest reports the new-file coverage as 0%
And  the hook exits non-zero
And  the operator is shown the remediation options (add test, add to .clean-code-exceptions.yaml)

Given a diff whose total coverage is 1 percentage point above the per-repo baseline floor
When the pre-push hook runs Gate 2a
Then the hook passes
And  Gate 2b CodeRabbit CLI fires next

FR-1.3: Pre-push logic + style + architecture review (CodeRabbit CLI)

FieldContent
User StoryAs an operator about to push, I want CodeRabbit CLI to review the diff (including any genji-authored tests) for logic, style, and architectural issues, with a bounded iteration loop, so that reviewable defects are caught before they hit GitHub.
Descriptioncoderabbit --agent --base main runs after Gate 2a succeeds. Raiden triages findings into fix-now / fix-in-loop / defer-with-rationale. Max-3 iteration loop. Exit states: CLEAN (proceed with push), BLOCKED (halt push, escalate to Tobi-san), CAPPED (3 iterations exhausted, document outstanding items and push with deferred findings recorded).
DeliverableDeliverable 1

Acceptance Criteria:

Given CodeRabbit CLI returns BLOCKING findings in iteration 1
When Raiden dispatches Genji to apply fixes and re-run
Then iteration 2 runs on the fixed diff
And  the loop terminates when CLEAN or iteration 3 completes

Given iteration 3 completes with remaining BLOCKING findings
When the operator confirms the CAPPED exit
Then the remaining findings are logged to the sprint migration log
And  the push proceeds

FR-1.4: Server-side enforcement (GitHub Actions + CodeRabbit App)

FieldContent
User StoryAs LISA responsible for main-branch integrity, I want every PR to main to pass a deterministic server-side gate (lint, typecheck, tests, coverage, Semgrep, no-new-code-without-tests, CodeRabbit App review) before I can approve it, so that no amount of local bypass can land unclean code.
DescriptionA single shared .github/workflows/clean-code-gate3.yml runs on every pull_request targeting main. Required status checks are wired into branch protection. The CodeRabbit GitHub App performs server-side review independently. LISA's CODEOWNERS approval is required to merge.
DeliverableDeliverable 1

Acceptance Criteria:

Given a PR whose diff adds src/foo.ts with no test file
When Gate 3 runs the check-new-code-has-tests action
Then the action fails with a remediation message referencing .clean-code-exceptions.yaml

Given a PR that passes all Gate 3 automated checks
When LISA approves via the the OS core repo account
Then the merge button unlocks for Tobi-san

Given a PR where LISA has not approved
When the required-checks report as green
Then the merge button remains locked
And  the CODEOWNERS requirement is the blocking constraint

3.2 Cold-Migration (Deliverable 2: Clean baseline per repo)

FR-2.1: Sprint 0 baseline analysis

FieldContent
User StoryAs the migration lead, I want a single Sprint 0 dispatch that measures every repo's current coverage %, Semgrep finding count, lint/typecheck state, and repair-effort estimate, so that the per-repo thresholds and sprint queue can be set from data rather than guesses.
DescriptionSprint 0 runs analyze-coverage-mcp + full-repo Semgrep scan + lint + typecheck + untested-file count on all 5 repos and populates the baseline table in the sprint log. The Q5 repair-effort bracket (<20%, 20-50%, >50%) is computed per repo and feeds the sprint queue.
DeliverableDeliverable 2

Acceptance Criteria:

Given Sprint 0 has completed baseline analysis
When yoshimitsu publishes the sprint queue
Then every repo has a measured coverage baseline
And  every repo has a Semgrep finding count per severity
And  every repo has a repair-effort bracket recorded

Given a repo falls in the >50% repair-effort bracket
When the sprint queue is proposed
Then that repo is escalated to Tobi-san for a retire / refactor / deprioritise decision before being queued

FR-2.2: Five-phase per-repo migration template

FieldContent
User StoryAs Raiden driving a migration sprint, I want a standardised five-phase template (A: Semgrep, B: genji test generation, C: CodeRabbit loop, D: Gate 3 canary, E: merge) so that every repo migration follows the same deterministic sequence.
DescriptionEach sprint targets a dedicated migration/clean-code-pipeline branch. Phase E merges to main and ratchets the per-repo threshold in .coderabbit.yaml to the D3 target.
DeliverableDeliverable 2

Acceptance Criteria:

Given a sprint is in Phase B (genji test generation)
When the generated test suite spot-check passes at 80% or higher
Then Phase B is marked complete
And  Phase C (CodeRabbit CLI loop) begins

Given Phase D (Gate 3 canary PR) surfaces a finding that Gate 2 missed
Then the finding is classified as a Gate 2 escape
And  a lesson-learned entry is filed
And  Phase D is not marked complete until the issue is fixed and Gate 3 is re-run green

FR-2.3: Pilot-sprint retrospective (Q3 mandatory gate)

FieldContent
User StoryAs Tobi-san committing review bandwidth, I want Sprint 1 to be a mandatory pilot with a yoshimitsu retrospective before Sprint 2 starts, so that guessed effort is replaced with measured effort before 75% of the cold-migration work is committed.
DescriptionSprint 1 targets the smallest-complexity repo from the queue. On Phase E close, yoshimitsu dispatches retrospectively: revised API cost projection, revised effort estimates, tool-config adjustments, pivot options (including Tusk as commercial fallback if Claude-native quality proves insufficient). Sprint 2-4 are gated on explicit Tobi-san greenlight.
DeliverableDeliverable 2

Acceptance Criteria:

Given Sprint 1 has closed on Phase E
When yoshimitsu's retrospective dispatch completes
Then Tobi-san receives revised projections and recommended adjustments
And  Sprint 2 does not kick off until Tobi-san greenlight is recorded

FR-2.4: Broken-baseline decision rule (Q5)

FieldContent
User StoryAs the migration lead, I want a pre-committed effort-based decision rule for broken baselines so that "this repo doesn't compile" does not trigger per-repo re-litigation.
Description<20% scope-expand within sprint. 20-50% dedicated repair sub-sprint before Phase A. >50% escalate to Tobi-san (retire / refactor / deprioritise). Thresholds recalibrated after Sprint 1 retrospective if pilot data suggests otherwise.
DeliverableDeliverable 2

Acceptance Criteria:

Given baseline analysis shows a repo in the 20-50% bracket
When the sprint queue is populated
Then a dedicated repair sub-sprint is scheduled before the normal five-phase sprint for that repo

3.3 Governance Enforcement (Deliverable 3: Executable governance)

FR-3.1: Semgrep custom rules from CLAUDE.md

FieldContent
User StoryAs LISA responsible for Cipher Shinobi code discipline, I want CLAUDE.md governance rules encoded as Semgrep custom rules so that they are enforced mechanically at write time and server time, not trusted to human diligence.
DescriptionGray Fox is dispatched in Sprint 0 to author a custom rule set under cipher-shinobi.* namespace. Mandatory coverage: Raw Twin Discipline (write-path MCP tools need *_raw twin on structured fields), dispatch patterns (write_psychic_cache_raw calls inside a dispatch context carry dispatch_id), Code Discipline Protocol (prohibited patterns: silent catch, unhandled promise rejection, hardcoded gateway URLs, hardcoded credentials), FCF naming conventions (filename segment grammar check, commit message check). Rules smoke-tested against every repo's Sprint 0 baseline before Sprint 1.
DeliverableDeliverable 3

Acceptance Criteria:

Given Gray Fox has authored the custom rule set
When a smoke test runs against all 5 repo baselines
Then the false-positive count is below the Sprint 0 budget (documented at authoring time)
And  every CLAUDE.md invariant from the mandatory list has at least one corresponding rule

Given a PR introduces a new write_psychic_cache_raw call without dispatch_id inside a dispatch context
When Gate 1 or Gate 3 Semgrep runs
Then the finding is raised as CRITICAL

FR-3.2: .clean-code-exceptions.yaml schema

FieldContent
User StoryAs Tobi-san adding a type-only declaration file, I want a single governed escape hatch from the no-new-code-without-tests check so that legitimate exemptions are committed, reviewed, and auditable.
DescriptionRepo-root YAML file with exceptions: list; each entry requires path, reason, added. Schema documented in §07. The Q4 CI action reads this file before emitting failures.
DeliverableDeliverable 3

Acceptance Criteria:

Given a new source file added to .clean-code-exceptions.yaml with a reason
When Gate 3 check-new-code-has-tests runs
Then the file is exempt
And  the exemption is visible to LISA in the PR diff

3.4 Fuda Scoping and Dependency Tracking (Deliverable 4: Upfront change discipline)

FR-4.1: Gate 0 Fuda requirement on code dispatches

FieldContent
User StoryAs LISA coordinating implementation dispatches, I want every dispatch that produces code to reference an approved Fuda (Linear issue ID) via fuda_id so that no implementation proceeds without an upfront scoping contract.
DescriptionThe gateway's report_dispatch endpoint requires fuda_id (Linear issue ID) on every dispatch that will produce code artefacts. The gateway rejects dispatches without it. The fuda skill orchestrates Fuda creation: reads depmap.yaml, dispatches yoshimitsu to draft, posts to Linear, dispatches raiden + gray-fox for parallel review.
DeliverableDeliverable 4

Acceptance Criteria:

Given a dispatch that will produce code artefacts
When LISA calls report_dispatch without fuda_id
Then the gateway rejects the call with a clear error

Given an approved Fuda exists as a Linear issue
When LISA calls report_dispatch with the fuda_id
Then the dispatch is created
And  the gateway posts an automatic comment to the Linear issue

FR-4.2: Dependency map maintenance per repo

FieldContent
User StoryAs yoshimitsu drafting a Fuda, I want every tracked repo to maintain a machine-readable depmap.yaml so that dependency impact analysis is based on measured data rather than guesses.
DescriptionEvery tracked repo maintains depmap.yaml (YAML, repo root, auto-refreshed on merge to main via CI step) and DEPMAP.md (generated from YAML, Obsidian-friendly, derived not authoritative). Coverage: static imports/exports (module-level), runtime dependencies (services, env vars, config files), cross-repo dependencies.
DeliverableDeliverable 4

Acceptance Criteria:

Given a merge to main on any tracked repo
When the CI step runs
Then depmap.yaml is regenerated from the current repo state
And  DEPMAP.md is regenerated from the YAML
And  both files are committed if changed

Given yoshimitsu is drafting a Fuda
When he reads depmap.yaml for the target repo
Then the dependency impact analysis section is populated from measured data

FR-4.3: Gate 0 scoping review before implementation

FieldContent
User StoryAs Tobi-san committing review bandwidth, I want a structured scoping review (yoshimitsu draft + raiden/gray-fox parallel review) to complete before any implementation dispatch so that fundamental scoping failures are caught upfront rather than at Gates 1-3.
DescriptionGate 0 enforcement model: Layer 1 (structural) — fuda_id required on report_dispatch; Layer 2 (analytical) — yoshimitsu drafts, raiden reviews all Fuda for completeness/soundness, gray-fox reviews Fuda touching security-tagged modules; Layer 3 (human) — risk threshold triggers Tobi-san sign-off. Fast-track: <=2 files, single module, no cross-module deps. Max 2 revision rounds, then escalate.
DeliverableDeliverable 4

Acceptance Criteria:

Given a Fuda touching >5 files in the depmap graph
When raiden completes his review
Then the Fuda is flagged for mandatory Tobi-san sign-off before implementation

Given a Fuda touching <=2 files in a single module with no cross-module deps
When the fuda skill runs
Then gray-fox review is skipped (fast-track)
And  yoshimitsu drafts + raiden reviews alone

FR-4.4: Linear issue lifecycle tracking

FieldContent
User StoryAs Tobi-san tracking mission progress, I want Fuda lifecycle state to be tracked end-to-end in Linear so that every change contract's status is visible without checking multiple systems.
DescriptionLinear issue state tracks the Fuda lifecycle: Planned -> In Progress -> In Review -> Merged -> Done. Gateway posts automatic comments on report_dispatch and report_complete. Comment format: dispatch ID, agent name, task description, status, duration, deliverable links.
DeliverableDeliverable 4

Acceptance Criteria:

Given a dispatch completes successfully
When the gateway calls report_complete
Then an automatic comment is posted to the Linear issue
And  the comment includes dispatch ID, agent name, result summary, and duration

Traceability Matrix

Plan DeliverableFeature IDs
Deliverable 1: Three-gate enforcement surfaceFR-1.1, FR-1.2, FR-1.3, FR-1.4
Deliverable 2: Clean baseline per repoFR-2.1, FR-2.2, FR-2.3, FR-2.4
Deliverable 3: Executable governanceFR-3.1, FR-3.2
Deliverable 4: Upfront change disciplineFR-4.1, FR-4.2, FR-4.3, FR-4.4

04. SYSTEM ARCHITECTURE

4.1 Architecture Overview

The Clean Code Pipeline is a defence-in-depth quality gate implemented as four composable layers. Each layer is independently authoritative at its point of execution, and no layer is optional. The guiding principles are:

  1. Scope before code (v1.1.0): Gate 0 (Fuda) enforces upfront change scoping — dependency impact analysis, security assessment, rationale interrogation — before any code is written. Downstream gates fine-tune; they do not discover fundamental scoping failures.
  2. Shift-left: catch findings as close to the point of authorship as possible (Gate 1 at write-time), so feedback loops are tight and remediation is cheap.
  3. Server-authoritative: no client-side gate is trusted to run; Gate 3 re-validates everything. A missed install, a --no-verify, or a SKIP_GATES=1 still lands at Gate 3.
  4. Executable governance: CLAUDE.md rules become Semgrep custom rules. A rule that is documented in a Docu but not enforced by a tool is considered undefended and must be migrated to a custom rule.
  5. Measured enforcement: thresholds are derived from measured baselines, not imposed from industry folklore. The D3 per-repo + 5% floor + 60% new-file floor is the concrete instance of this principle. Dependency impact analysis at Gate 0 is driven by measured depmap.yaml, not guesses.
  6. Bounded iteration: every loop has an explicit termination condition (Gate 0 max-2 revision rounds, Gate 2b max-3, Sprint 1 retro, broken-baseline escalation) so the pipeline cannot spin indefinitely on unfixable defects.

Architecturally, the pipeline is stateless between gates — each gate reads the repo state, runs its deterministic check, and writes a pass/fail signal. State that needs to persist across gates (coverage baselines, custom rules, exceptions) lives in the repo itself as versioned files (.coderabbit.yaml, .clean-code-exceptions.yaml, .semgrepignore, cipher-shinobi/semgrep-rules/).


4.2 Four-Gate Pipeline Diagram

  PRE-CODE               WRITE-TIME                PRE-PUSH                SERVER-SIDE
   (Gate 0)                (Gate 1)                 (Gate 2)                 (Gate 3)
  ----------             ------------             ----------              -------------

fuda skill trigger     Claude Code editor       local git hook          GitHub Actions runner
       |                      |                       |                         |
       | /fuda or auto        | post_write_code       | pre-push                | pull_request
       v                      v                       v                         v
+----------------+     +----------------+      +----------------+        +------------------+
| Fuda Scoping   |     | Semgrep MCP    |      | Gate 2a: Cov.  |        | workflow:        |
| - depmap.yaml  |     | - auto rules   |      | - coverage +5% |        |  lint            |
| - yoshimitsu   |     | - custom rules |--+   | - new-file 60% |        |  typecheck       |
|   drafts       |     | - SAST/SCA     |  |   +-------+--------+        |  test + coverage |
| - raiden +     |     | - secrets      |  |           |                  |  Semgrep CI      |
|   gray-fox     |     +----------------+  |           v                  |  new-code-check  |
|   review       |            |            |   +----------------+         +---------+--------+
| - Linear issue |            v            |   | Gate 2b:       |                   |
| - risk check   |       clean / escalate  |   | CodeRabbit CLI |                   v
+-------+--------+                         |   | - logic        |         +------------------+
        |                                  |   | - style        |         | CodeRabbit App   |
        v                                  |   | - architecture |         | - server review  |
  fuda_id on                               |   | max-3 loop     |         +---------+--------+
  report_dispatch                          |   +-------+--------+                   |
        |                                  |           |                            v
        +-------->  implementation  -------+           v                  +------------------+
                                  |          git push                  | CODEOWNERS       |
                                  |             |                      | @the OS core repo   |
                                  +------>------+-------> origin ----->+ approval         |
                                                                       +---------+--------+
                                                                                 |
                                                                                 v
                                                                            merge to main

Read the diagram as: data flows left-to-right through four gates. Gate 0 runs before code is written (scoping contract); Gate 1 is advisory within Claude Code at write-time; Gates 2 and 3 are hard blocking. Gate 0's fuda_id is carried on report_dispatch and threads through all subsequent gates as the change contract's identity.


4.3 Gate Execution Contract

Each gate is characterised by five fields:

GateTriggerInputOutputBlocking?Recovery
0fuda skill (manual /fuda or auto-trigger on code write/modify/delete)depmap.yaml + change descriptionApproved Fuda (Linear issue) with fuda_idHard block on report_dispatch (gateway rejects without fuda_id)Draft → review → revise (max 2 rounds) → escalate to Tobi-san
1post_write_code hook on file writeSingle file just written{status: clean|finding} per ruleAdvisory (agent regenerates or escalates)Escalate to LISA → Gray Fox adjudication; no silent dismissal
2apre-push hookDiff vs origin/main{pass, new-file-coverage-ok, coverage-delta}Hard blockFix tests or add to .clean-code-exceptions.yaml
2bpre-push hook after 2aDiff vs origin/main{CLEAN, BLOCKED, CAPPED} + findingsHard block on BLOCKEDRaiden iteration loop (max 3); then CAPPED with documented deferrals
3 (Actions)pull_request eventFull repo on PR refStatus check green/red per stepHard block via required status checksFix on PR branch, re-run
3 (App)pull_request eventDiffReview comments + summaryAdvisory unless BLOCKINGAddress review, re-request
3 (CODEOWNERS)merge button clickN/ALISA approval requiredHard block via branch protectionLISA reviews and approves via the OS core repo

4.4 Technology Stack

LayerTechnologyVersionRationale
Pre-code scoping (Gate 0)fuda skill + Linear MCP toolsv1 (v1.1.0)Orchestrates Fuda creation, agent review, Linear posting; fuda_id threads through dispatch lifecycle
Dependency mappingmadge (TS/JS), pydeps (Python) + CI refresh stepLatestGenerates depmap.yaml consumed by Gate 0 Fuda scoping
Write-time SAST/SCA/secretsSemgrep MCP plugin + Semgrep CLILatestDeterministic, multi-language, rule-authorable, free for public rules
Write-time rule setCustom cipher-shinobi.* rules (Gray Fox, Sprint 0)v1Encodes CLAUDE.md, Code Discipline Protocol, Raw Twin Discipline
Coverage visibility@sofia-open-source/analyze-coverage-mcpnpm latestPrimitive consumed by agents at all phases; LCOV-based
Test generationClaude-native (genji sub-agent dispatches + analyze-coverage-mcp)N/ATargeted test authoring per module, guided by coverage gaps, respecting codebase conventions
Test quality validationStryker mutation testing (@stryker-mutator/core + @stryker-mutator/vitest-runner)npm latestMutation score >= 70% acceptance criterion; validates assertion quality beyond line coverage
Pre-push coverage verificationvitest run --coverageper-repo vitest configDiff-scope coverage regression check at pre-push time
Pre-push logic reviewCodeRabbit CLI (Pro tier)LatestCLAUDE.md-aware agent loop, bounded iteration, reads repo context
Server-side reviewCodeRabbit GitHub App (Pro tier)N/AAutomated PR review independent of local pipeline
Server-side CIGitHub ActionsN/ANative GitHub integration, free-tier allotment sufficient
Hook orchestrationpre-push shell script templatev1 (artefact)Sequences Gate 1 → 2a → 2b; SKIP_GATES=1 emergency bypass with audit log
Exception management.clean-code-exceptions.yaml at repo rootSchema v1Committed, reviewed, auditable exemptions from no-new-code-without-tests gate
Branch protectionGitHub nativeN/ARequired status checks + signed commits + linear history + CODEOWNERS

05. REPO TOPOLOGY

5.1 Five-Repo Layout

The pipeline governs five cipher-shinobi GitHub repos. D5 mandates Lisa-OS migrates first — the pipeline cannot credibly enforce governance it has not been subjected to itself.

#RepoPurposeLanguage(s)Migration order
1Lisa-OS (OS core repo) (new)LISA's operating system: memory gateway code, agent prompts, skills, governance docsTypeScript, shell, MarkdownSprint 1 (pilot)
2Application repo AFull-stack curriculum appTypeScript, SQLSprint 2-4 per queue rank
3Application repo BStandalone HTML deck (Vercel-deployed)HTML, JS, CSSSprint 2-4 per queue rank
4Application repo CPDF pipeline, automation scriptsPython, shellSprint 2-4 per queue rank
5Application repo DMedia-asset style management toolTypeScriptSprint 2-4 per queue rank

Migration order rationale (D5): Lisa-OS carries the executable governance (Semgrep custom rules are derived from its CLAUDE.md + CS.AK.LISA.Docu.* files). Running Lisa-OS through its own pipeline first is the only honest ratification of the rules. If the rules cannot survive contact with the code that spawned them, they are unfit for the application repos. Sprint 1 pilot data also calibrates API cost projections and tool ergonomics on the most governance-dense repo before the mission spends budget on the others.

Parallel execution: explicitly forbidden (Plan §7 rationale; R7). Tobi-san's review attention is the bottleneck. Sprints run serially.


5.2 Lisa-OS Repo Boundary — ADR-1 Resolution

The single most consequential architectural decision in this TechSpec is the boundary of the Lisa-OS repo. The plan did not resolve it. This TechSpec locks it via ADR-1.

Decision (ADR-1): Lisa-OS scope is option (c) — memory_gateway/ + .claude/ + governance subset. Specifically, the cipher-shinobi/lisa-os repo mirrors the following files and directories from the vault, preserving relative paths from the repo root:

lisa-os/                                   (repo root)
├── CLAUDE.md                              (from vault root, governance entrypoint)
├── memory_gateway/                        (from artefacts/code/lisa/memory_gateway/)
│   ├── server/
│   ├── migrations/
│   ├── test/
│   ├── package.json
│   ├── tsconfig.json
│   └── ...
├── .claude/
│   ├── agents/                            (8 CyberShinobi agent prompts)
│   ├── agent-memory/                      (per-agent domain memory, flat markdown, load-bearing)
│   ├── memory/                            (Tier 1 only: MEMORY.md, feedback_*.md, reference_*.md, project_*.md)
│   ├── skills/                            (custom skills only, not marketplace)
│   ├── plans/                             (.claude/plans/*.md)
│   ├── settings.json                      (secrets-scrubbed)
│   └── settings.local.json                (secrets-scrubbed; .pending.* excluded)
├── entities/
│   └── ENT.Lisa.Compressed.md             (LISA's character reference — in scope)
├── governance/                            (symlinked or mirrored subset)
│   ├── CS.AK.LISA.Docu.CodeDisciplineProtocol.md
│   ├── CS.AK.LISA.Docu.RawTwinDiscipline.md
│   ├── CS.AK.LISA.Docu.VaultGovernance.md
│   ├── CS.AK.LISA.Docu.MemoryArchitecture.md
│   ├── CS.AK.LISA.Docu.LisaOSMap.md
│   ├── CS.AK.LISA.Docu.SecurityOperations.md
│   ├── CS.AK.LISA.Docu.PlanningDiscipline.md
│   ├── CS.AK.LISA.Docu.OperationalProtocols.md
│   └── PER.EX.SAG_SYSX.Docu.FileClassFramework.md
├── .coderabbit.yaml
├── .clean-code-exceptions.yaml
├── .semgrepignore
├── cipher-shinobi/                        (custom rules)
│   └── semgrep-rules/
└── .github/
    ├── workflows/clean-code-gate3.yml
    └── actions/check-new-code-has-tests/

In-scope file categories:

  • Memory gateway code — the TypeScript server, MCP handlers, dispatch + cache + context APIs, migrations, tests, package manifest. This is the load-bearing runtime.
  • .claude/ orchestration — agent prompts (.claude/agents/*.md), custom skills (.claude/skills/**), plan files (.claude/plans/*.md), hooks, settings (secrets scrubbed — see §10).
  • Agent domain memory.claude/agent-memory/<agent>/** for every CyberShinobi agent. Flat markdown, load-bearing for agent behaviour across sessions, not personal. Full inclusion, no scrub.
  • Tier 1 auto-memory.claude/memory/{MEMORY.md, feedback_*.md, reference_*.md, project_*.md}. LISA's procedural memory that every session loads in konnichiwagwan Phase A. Slow-changing, governance-adjacent, the substrate the pipeline must protect. Excludes user_profile.md (contains PII — identity, family, faith — kept unversioned, outside every repo; location withheld).
  • Governance subset — CLAUDE.md at the repo root (so CodeRabbit, Semgrep, and every agent can read it) and the load-bearing LISA Docu files + FCF + ENT.Lisa.Compressed.md. These are the rules Semgrep custom rules cite and enforce against the repo.

Out-of-scope file categories (stay in the vault only):

  • Mission-specific Forge files (all CS.AK.ClientA.*, CS.YB.*, PER.LV.*, PER.SA.*, personal daily / weekly notes).
  • Entity files for humans (the operator entity profile, ENT.CipherShinobi.md, etc.) — these contain the operator's personal data and the org's private org-chart.
  • Intel / Meeting / Log files — temporally scoped and not load-bearing for pipeline enforcement.
  • artefacts/media/, artefacts/pdfs/, large binary assets.
  • Other code projects' subtrees under artefacts/code/ (lisa_curriculum, gws-profile-skill, etc.) — those live in their own repos.
  • Tier 2/3 auto-memory.claude/memory/CS.*.md + .claude/memory/PER.*.md (≈124 files at time of writing). Per-conversation semantic and episodic memory exports; high-velocity, mission-specific, ephemeral-ish. Canonical copies live in the memory gateway SQLite DB; these markdown files are dashboard-friendly projections, not load-bearing for pipeline enforcement. Gitignored.
  • PII auto-memoryuser_profile.md. Contains Tobi-san's identity, family, faith. Excluded even from the private repo to keep PII out of git history permanently. Kept unversioned, outside every repo; location withheld.
  • Session ephemera.claude/settings.pending.json, .claude/settings.pending.md, .claude/.DS_Store, any .bak / .orig variants, paste-cache/, shell-snapshots/, session-env/. Per-session state with no governance relevance.

Mechanism: initial implementation uses one-way vault→repo copy via a Sprint 0 script. Long-term, the governance-subset files may move to living in the Lisa-OS repo as their primary home and get symlinked back into the vault for Obsidian browsing. This symlink direction is a Sprint 1 retro decision point, not locked here.

Rationale summary (full rationale in ADR-1):

  1. Narrower options ((a) memory_gateway/ only, (b) + .claude/) leave the governance docs that Semgrep rules must cite outside the repo. The pipeline cannot verify that the rules still match the docs they were derived from. That is the same class of failure as the MCP-schema-cache staleness bug — a stale reference that the system cannot self-detect.
  2. Broader options ((d) everything minus user vault) pull in mission-specific Forge files, personal ENT profiles, and Intel/Meeting/Log temporal files. Scan noise explodes, secrets risk rises (ENT files contain operator personal data), and the repo becomes a proxy for the entire vault — collapsing the vault-vs-code separation that Tobi-san's operator preference explicitly maintains.
  3. Option (c) is the minimum closure that makes the executable governance principle honest: the rules, the docs they cite, the runtime they govern, and the agents that execute under them all live in one repo with one version history.

5.3 Repo Relationships

Lisa-OS (OS core repo)       <--- governance source (Semgrep custom rules authored from its contents)
      |
      v (custom rule set published as a shared semgrep-rules repo)
+------+------+------+------+
|      |      |      |      |
v      v      v      v      v
App repo A       App repo B    App repo C    App repo D

Each application repo consumes the custom rule set via .semgrepignore / .semgrep.yml pointing at
the shared semgrep-rules repo (vendored or fetched at CI time).

Gate 1 is per-workstation (Claude Code MCP plugin). Gates 2 and 3 are per-repo but share the same rule set and the same .clean-code-exceptions.yaml schema. The only repo-specific files are .coderabbit.yaml (per-repo thresholds) and the .clean-code-exceptions.yaml list itself.


5.4 Dependency Maps (v1.1.0)

Every tracked repo maintains two dependency map artefacts at its root:

FileFormatAuthorityRefresh mechanism
depmap.yamlYAMLAuthoritative (machine-readable source of truth)Auto-refreshed on every merge to main via a lightweight CI step
DEPMAP.mdMarkdownDerived (generated from YAML for Obsidian/human consumption)Regenerated alongside depmap.yaml; committed if changed

Coverage — each depmap.yaml captures three dependency categories:

  1. Static dependency graph: module-level imports and exports. For TypeScript/JavaScript repos, generated via madge --json or equivalent; language-appropriate tooling for Python (pydeps), shell (manual annotation).
  2. Runtime dependencies: services the repo calls at runtime, environment variables it reads, config files it consumes.
  3. Cross-repo dependencies: any import, API call, or shared artefact that crosses repo boundaries within cipher-shinobi.

Schema (minimal valid depmap.yaml):

# depmap.yaml — auto-generated on merge to main
# DO NOT EDIT MANUALLY — regenerated by CI step
 
schema_version: 1
repo: cipher-shinobi/lisa-os
generated_at: 2026-04-10T14:00:00Z
 
modules:
  - path: server/mcp/index.ts
    imports:
      - server/dispatch/index.ts
      - server/psychic-cache/index.ts
    exports:
      - startMcpServer
    tags: []
 
  - path: server/dispatch/index.ts
    imports:
      - server/dispatch/types.ts
      - shared/zod-error.ts
    exports:
      - reportDispatch
      - reportComplete
    tags: [security]   # security-tagged modules trigger gray-fox Gate 0 review
 
runtime_deps:
  - name: SQLite database
    type: service
    env_var: GATEWAY_DB_PATH
    config_file: null
 
  - name: Voyage AI embeddings
    type: api
    env_var: VOYAGE_API_KEY
    config_file: null
 
cross_repo_deps: []

Auto-refresh CI step: a dedicated job in each repo's CI workflow that runs on push to main:

  refresh-depmap:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - name: Generate depmap
        run: |
          # Language-specific generation (TS/JS example)
          npx madge --json --ts-config tsconfig.json src/ > depmap.raw.json
          node scripts/generate-depmap.js depmap.raw.json > depmap.yaml
          node scripts/generate-depmap-md.js depmap.yaml > DEPMAP.md
      - name: Commit if changed
        run: |
          git diff --quiet depmap.yaml DEPMAP.md || {
            git config user.name "the OS core repo"
            git config user.email "<bot-noreply>"
            git add depmap.yaml DEPMAP.md
            git commit -m "chore: refresh depmap [auto]"
            git push
          }

Gate 0 consumption: the fuda skill reads depmap.yaml to populate the Fuda's dependency impact analysis section. Yoshimitsu uses the static graph to identify affected modules and cross-module dependencies. Gray-fox uses the tags: [security] annotations to determine whether a Fuda requires his review.

Sprint 0 exit criterion: baseline depmap.yaml + DEPMAP.md generated for all 5 repos (see §8.1 items 10-11).


06. GATE DESIGN

6.0 Gate 0 — Fuda Scoping (v1.1.0)

Fuda (札) is a sealed, scoped change contract created before any code is written. The name draws on the Japanese concept of a talisman-seal — once affixed, it defines what the change is permitted to touch and why. Gate 0 shifts the pipeline's centre of gravity from retroactive correction (Gates 1-3 finding defects after the fact) to upfront rigour — downstream gates fine-tune, they do not discover fundamental scoping failures.

Trigger: the fuda skill (~/.claude/skills/fuda/), invoked manually via /fuda or auto-triggered on:

  • Code write/modify/delete in any tracked repo
  • New repo creation
  • Pipeline artefact modification (Semgrep rules, CI workflows, hook templates)
  • Genji dispatch with code deliverables

Does not trigger on: pure research dispatches, vault-only markdown edits, code reading without modification, pipeline already active for the current change.


6.0.1 Required Fuda Sections

Every Fuda must contain the following sections. Yoshimitsu drafts all sections by reading depmap.yaml and the dispatch brief.

SectionContentSource
Change scopeWhat files, modules, and repos are touched; what the change does and does not doDispatch brief + depmap.yaml
Dependency impact analysisWhich modules import/export the touched files; cross-module and cross-repo impact; runtime dependency changesdepmap.yaml static graph + runtime deps + cross-repo deps
Performance considerationsExpected impact on Gate latency, runtime performance, resource consumptionYoshimitsu analysis
Security considerationsWhether the change touches security-tagged modules; secrets exposure risk; auth/access control implicationsdepmap.yaml tags: [security] + yoshimitsu analysis
Best-practice hygieneLanguage-specific idioms, framework patterns, codebase conventions the implementer must followCodeDisciplineProtocol + repo CLAUDE.md
Rationale interrogationWhy this approach over alternatives; what was considered and rejected; trade-off analysisYoshimitsu analysis

6.0.2 Enforcement Model (Three Layers)

Layer 1 — Structural enforcement (gateway)

fuda_id (Linear issue ID) is a required field on report_dispatch in the gateway schema (dispatch/types.ts :: ReportDispatchInputSchema). The gateway rejects any dispatch call that omits it. This makes Gate 0 structurally inescapable for code-producing dispatches — no Fuda, no dispatch.

Layer 2 — Analytical enforcement (agent review)

Three agents participate in Gate 0, with orthogonal review scopes:

AgentRoleScopeTriggers
YoshimitsuDrafts the FudaAll Fuda — reads depmap.yaml, fills all required sectionsEvery Fuda
RaidenReviews for completeness + soundnessAll Fuda — checks every section is non-empty, scope matches depmap, rationale is coherentEvery Fuda
Gray FoxReviews for security implicationsOnly Fuda that touch modules with tags: [security] in depmap.yaml (auth, crypto, access control, secrets, external APIs)Security-tagged Fuda only

LISA coordinates — she dispatches the agents, posts to Linear, manages the review cycle. LISA never drafts and never reviews.

Raiden and gray-fox review in parallel (orthogonal scopes — completeness/soundness vs security). Their review outputs are merged before proceeding to approval.

Layer 3 — Human enforcement (risk threshold)

Mandatory Tobi-san sign-off when any of the following apply:

  • >5 files in the depmap.yaml dependency graph of the touched modules
  • Security domain: change touches security-tagged modules
  • Cross-repo impact: change affects modules in more than one cipher-shinobi repo
  • Governance/pipeline artefact changes: edits to CLAUDE.md, Semgrep rules, CI workflows, agent prompts, this TechSpec

6.0.3 Fast-track Threshold

Condition: the change touches <=2 files in a single module with no cross-module dependencies in depmap.yaml.

Fast-track flow: yoshimitsu drafts + raiden reviews alone. Gray-fox skips (no security review needed for single-module, small-scope changes). No second round-trip. Designed to prevent Gate 0 from becoming ceremony on trivial changes.


6.0.4 Revision Cap

Reviewers (raiden, gray-fox) may flag issues on the Fuda. Yoshimitsu revises. Maximum 2 revision rounds. If issues remain after round 2, escalate to Tobi-san for resolution. This prevents infinite drafting loops on genuinely ambiguous scoping questions.


6.0.5 Fuda Lifecycle

Draft  -->  Under Review  -->  Approved  -->  In Progress  -->  Merged  -->  Done
  |              |                                |
  |         (revision)                      (blocked/failed)
  |              |                                |
  +--- max 2 ---+                          Escalate to Tobi-san
       rounds

Each lifecycle state maps to a Linear issue status (see §9.6):

Fuda stateLinear status
DraftPlanned
Under ReviewPlanned (sub-state tracked in comments)
ApprovedIn Progress
In ProgressIn Progress
MergedMerged
DoneDone

6.0.6 V-3 Ladder Randomised-Order Pre-Lock Discipline (v1.3.0)

Mandate: any Fuda whose smallest-viable V-3 config relies on execution-order discipline (e.g. fileParallelism: false, sequence.shuffle: false, --runInBand without state-reset) MUST include a randomised-order pre-lock step at each V-3 ladder rung. Without it, the lock declaration is structurally exposed to stochastic masking — default file-import order may happen to avoid the bug class while the underlying defect persists.

Mechanism class (yoshimitsu evaluates at Fuda drafting time):

ClassExamplesPre-lock requirement
Execution-order-onlyfileParallelism: false, sequence.shuffle: false, --runInBand (no state reset)MANDATORY
State-reset-alsopool: 'forks' + isolate: true (module graph reset per file), --testEnvironment node + jest.resetModules()Recommended but not blocking

Required Fuda cross-section coherence:

  1. §07.3 V-3 directive — add: Before locking [smallest config], run ONE additional V-2 cycle with randomised file order (sequence.shuffle: true CLI override; NOT persisted to config). Require 10/10 PASS under randomised order. If FAIL, advance to next ladder rung.
  2. §08 risk register — add: [Smallest config] passes 10/10 stochastically while underlying [bug class] persists, masked by sequential-execution timing | Medium | Medium (false-fix lands; class not eliminated) | §07.3 V-3 randomised-order pre-lock mandates verification before lock declaration.
  3. §10 Phase 2 V-3 sub-sequence — add randomised-order pre-lock bullet under smallest-config success branch.
  4. Cross-section verification — §07.3 ↔ §08 ↔ §10 all reference the same sequence.shuffle (or framework equivalent) mechanism with consistent ladder-step language.

Reviewer gate: raiden + gray-fox MUST verify pre-lock is present and cross-section coherent for any V-3 ladder Fuda where the smallest-viable config is execution-order-only. Block as substantive structural gap (not cosmetic) if absent.

Impl-time discipline (genji): execute pre-lock verbatim per Fuda directive. Document outcome in PR body alongside default-order results. If randomised-order FAIL, advance ladder rung — do NOT lock the failing config.

Framework generalisation: pattern applies wherever serial execution is the proposed fix for a singleton/state-leak bug class.

  • Vitest: fileParallelism / pool / sequence.shuffle
  • Jest: --runInBand + --randomize (Jest 30+)
  • pytest: -p no:randomly vs pytest-randomly
  • mocha: --sort vs random ordering
  • Any framework where serial execution is the proposed fix for a singleton/state-leak bug class

Empirical validation: AK-358 D1303 — Config A default-order 6/10 PASS (insufficient) and randomised-order 0/10 PASS (falsifying); ADV-4 caught the stochastic-mask false-fix that would otherwise have shipped if happenstance produced default-order 10/10. Per CP#23 §10.1.

Governance reference: feedback memory feedback_v3_ladder_randomised_order_pre_lock (companion at ~/…).

Scope clarification — --sequence.shuffle.files vs --sequence.shuffle.tests vs --sequence.shuffle (v1.4.0, D1349 amendment per D1347 genji + D1348 raiden convergent diagnosis on AK-365):

The Vitest --sequence.shuffle flag family carries three distinct semantics that the V-3 closure gate MUST distinguish. Conflating them risks shipping a false-fix or, worse, blocking on a non-defect:

FlagSemanticsAssertsV-3 closure gate status
--sequence.shuffle.filesRandomises FILE import order; intra-file test order preservedModule-singleton state does not leak across file boundaries within a worker processMANDATORY — this is the canonical V-3 closure gate for harness-config ladder work
--sequence.shuffle.testsRandomises TEST order WITHIN each describe block; file order preservedIntra-file describe-block independence (each describe runs without dependency on prior describe state)NOT a mandatory closure gate — describe-block data-coupling is a legitimate test design choice (e.g. write-then-read patterns where R-T-1.7 aggregate() depends on R-T-1.2 write() having seeded data in the same describe scope). Fix path is test refactor (AK-367 territory), NOT harness config
--sequence.shuffleVitest shorthand equivalent to {files: true, tests: true} per Vitest 4.1.5 docsBoth above simultaneouslyDO NOT USE as the V-3 closure gate without a prior test-design audit asserting describe-independence. Bare --sequence.shuffle over a codebase with legitimate intra-file coupling produces false failures that mask genuine cross-file leak signal

Empirical evidence — AK-365 D1347 genji code investigation (verbatim cache):

"the real bug = intra-file describe-block data-coupling (NEW class (e)). Per-file afterAll hygiene rollout REMAINS CORRECT defence-in-depth; V-3 closure gate amended to --sequence.shuffle.files only."

Empirical evidence — AK-365 D1348 raiden empirical correlation (verbatim cache):

"AK-365 V-3 failure empirical correlation D1348 raiden. ... Failure-mode classification: (a) listener residue (leaked event handler firing on next test); (b) mock residue (vi.spyOn not restored); (c) db state residue (rows from prior test polluting current); (d) timer residue (setInterval/setTimeout firing across test boundaries); (e) other."

The convergent diagnosis identified class (e) as describe-block data-coupling — a test-design property, not a singleton-leak. The AK-365 per-file afterAll rollout remains structurally correct as defence-in-depth against classes (a)-(d); the V-3 closure gate scope was narrowed from --sequence.shuffle (both axes) to --sequence.shuffle.files (cross-file axis only) because class (e) falls outside the CCP §6.0.6 taxonomy of singleton/state-leak bug classes.

Fuda authoring rule: any Fuda invoking §6.0.6 V-3 randomised-order pre-lock MUST specify --sequence.shuffle.files as the closure gate command unless an explicit test-design audit certifies describe-independence across all touched files. If --sequence.shuffle.tests (or the bare shorthand) is proposed as the closure gate, the Fuda MUST cite the audit and enumerate the audited files in §07.3.

Outcome class taxonomy — class (e) formalisation (v1.5.0, D1392 amendment per AK-368 Phase 1 closure 4-way convergence verdict):

The flake outcome class taxonomy underpinning §6.0.6 V-3 closure-gate reasoning is now codified as a 5-member set, formally promoting class (e) from informal v1.4.0 reference (where it was framed as "outside §6.0.6 a-d taxonomy") to full taxonomy membership alongside classes (a)-(d). This is necessary because class (e) describes a real, recurring flake mechanism that requires explicit fuda + reviewer treatment — leaving it unenumerated has caused repeated misclassification (AK-358 D1303 singleton-leak misdiagnosis; AK-365 V-3 0/10 stochastic-mask interpretation).

ClassMechanismClosure-gate axisFix path
(a)Listener residue — leaked event handler firing on next test--sequence.shuffle.files MUST detectPer-file afterAll listener-drain discipline (AK-365 Wave 1-6 baseline)
(b)Mock residue — vi.spyOn / vi.mock not restored between tests--sequence.shuffle.files MUST detectExplicit vi.restoreAllMocks() in afterEach or afterAll
(c)DB state residue — rows from prior test polluting current--sequence.shuffle.files MUST detectFresh openInMemoryDatabase() per test or per describe; explicit truncation between tests
(d)Timer residue — setInterval / setTimeout firing across test boundaries--sequence.shuffle.files MUST detectFactory-closure-scoped intervals with .unref() + .stop() correctly paired; vi.useFakeTimers() discipline
(e)Intra-file describe-block data-coupling — describe-block B reads state seeded by describe-block A within the same file (e.g. aggregate() describe queries 'old-skill' row written by write() describe)--sequence.shuffle.tests ONLY (intra-file axis); NOT detected by --sequence.shuffle.files (which preserves intra-file test order)Per-describe-independent refactor (move seeds into beforeEach or test body) OR audited --sequence.shuffle.tests gate POST-refactor. Production code is NOT modified — fix is test-design property

Diagnostic exemplar — class (e) (verbatim D1347 genji + D1387 §08 convergent evidence):

skills-telemetry-repository.test.ts: R-T-1.7 aggregate() describe queries data written by R-T-1.2 write() describe within the same describe scope. Under --sequence.shuffle.tests, aggregate can run before writeexpect(...).toBeDefined() fails because data not seeded yet (inverse of data-leaked signature — empty result expected non-empty).

4-way convergence empirical foundation (verbatim D1387 §08 + D1389 §03):

"CONFIRMED via 4-way convergence: D1303 (raiden, prior): singleton-leak hypothesis empirically falsified during AK-358. D1347 (genji, code-side, AK-358 reship): real bug class identified as intra-file describe-block data-coupling, NEW class (e) outside CCP §6.0.6 a-d taxonomy. D1348 (raiden, empirical, AK-358 reship): correlation evidence confirms intra-file mechanism. D1387 / Wave 3 (genji): independent V-2 + V-3 30-run cycles confirm cross-file flake is order-invariant (V-2 26.7% [14.2-44.4%] ≈ V-3 30.0% [16.7-47.8%], CIs overlap), listener-count signal is uniformly 0 (483/483 file-run snapshots), router-stack identity is invariant — all H3 + H1 surrogate signals absent. By elimination, the residual mechanism is intra-file."

Each leg of the convergence employed an independent methodology (raiden empirical falsification at D1303; genji code Read at D1347; raiden per-test empirical correlation at D1348; genji aggregator-based 30-run V-2 + V-3 instrumented cycles at D1387). Per feedback_convergent_investigation_falsification + feedback_minimum_sample_size_cross_file_flake, convergence across n=30+ empirical legs and direct-Read code legs is gold-standard.

Detection antidote — class (e):

  1. Affirmative detection: at fuda drafting time (yoshimitsu §07.3), grep target test files for cross-describe state references (describe(...) blocks whose body reads state declared in a sibling describe at module scope). Flag for refactor pre-Wave.
  2. Empirical detection: run --sequence.shuffle.tests against the file in isolation. If 10/10 PASS → describe-independent (class (e) absent). If <10/10 PASS → class (e) present; refactor pre-V-3 closure.
  3. Refactor pattern: move seeds into beforeEach (each test runs against fresh seeded state) OR move seeds into the test body itself (the test owns its data lifecycle). Production code is NOT modified — V-7 zero-prod-mod inheritance preserved.
  4. Post-refactor gate: after class (e) refactor, the affected files become eligible for --sequence.shuffle.tests as a stretch quality bar. The MANDATORY V-3 closure gate REMAINS --sequence.shuffle.files only (classes (a)-(d)).

Canonical class (e) refactor exemplar — AK-367 Phase 2: see AK-367 (P2 Medium 2pt, parent AK-358; elevated P4 → P2 at D1392 per Phase 1 outcome). Yoshimitsu Phase 2 Fuda drafting in flight via D1393 covering ~10 test files in the AK-368 Wave 3 instrumented set (activity-routes, agent-memory-routes, analytics-routes, auth-middleware, dispatch-routes, knowledge-graph-routes, psychic-cache-routes, search-routes, sessions-routes, tools-routes). Acceptance criterion: per-file fail rate drops to 0/30 across both V-2 + V-3 cycles post-refactor (n=30 Wilson 95% CI per feedback_minimum_sample_size_cross_file_flake).

Reviewer gate (raiden + gray-fox): a fuda invoking §6.0.6 in a context where class (e) is plausible (test files importing shared event-bus singletons + multiple sibling describes with cross-describe state references) MUST cite class (e) in §07.3 + §08 risk register + provide the affirmative-detection grep evidence. Block as substantive structural gap if missing.

Empirical caveat (D1387 §06 caveat preserved):

"H1 surrogate is a weaker signal than direct setImmediate-queue-depth instrumentation. Negative on surrogate does not strictly falsify H1, but removes its predicted observable."

The class (e) confirmation rests on eliminative inference under the H1 surrogate caveat. The 4-way convergence makes the eliminative inference load-bearing across methodologies (direct code Read at D1347 affirms the intra-file mechanism; D1387's eliminative leg rules out competing classes (a)-(d) at instrumented scale). The class (e) formalisation is robust under the caveat; deeper instrumentation (AK-369 setImmediate-depth) was OBVIATED by the eliminative result and is not pursued.


6.0.7 STOP-after-D Best-Available Ship Pattern (v1.3.0)

Mandate: when a V-3 empirical config ladder (Configs A-D) fully exhausts without any rung achieving the full quality bar (10/10 PASS default-order + 10/10 PASS randomised-order under §6.0.6 + within V-Runtime soft budget), the ladder MAY ship the best-available config under explicit STOP-after-D discipline. Budget breach is captured and handed off via hygiene ticket, not blocking.

STOP-after-D activation criteria (all three required):

  1. Ladder exhaustion: V-3 Configs A through D all empirically failed the full quality bar at their respective ladder rungs
  2. Harness-only V-7 HOLDS: zero production-source modifications; the failing-but-best config is harness-only
  3. Real-fix path routed: structural follow-on hygiene ticket filed (e.g. AK-365 per-file afterAll cleanup rollout for AK-358) before merge

Three gates of STOP-after-D ship discipline:

GateRequirementVerification
(a) TransparencyPR title carries BEST-AVAILABLE tag or equivalent; Fuda §10 documents the STOP outcome verbatim including the failing-config evidenceLISA verifies in pre-merge PR review
(b) Hygiene routingStructural follow-on ticket filed in Linear with priority/effort estimate before PR merge; Fuda §10 cross-references the new ticket IDlinear skill confirms ticket exists
(c) Operator ratificationTobi-san sign-off per §6.0.2 Layer 3 (governance/pipeline artefact change implication where ladder behaviour matters); decision recorded in checkpoint Standing Decisions tableLISA captures decision number + summary

V-Runtime budget breach acceptance: when STOP-after-D fires, the Fuda §03 soft budget is treated as captured trade-off rather than blocking constraint. Document actual runtime in PR body. The runtime cost driver should be identifiable and rooted in the failing-config's mechanism (e.g. isolate: true overhead for vitest; --runInBand serial overhead for jest) so the hygiene follow-through has a clear target for recovery.

Anti-pattern guards:

  • STOP-after-D is NOT a fast-track around full V-3 ladder discipline. Configs A-D must be empirically attempted at their respective rungs before STOP is valid.
  • STOP-after-D MUST NOT be used to ship without a hygiene follow-on. If no real-fix path exists, the change is not ready to merge — escalate to operator for scope re-cut.
  • STOP-after-D MUST NOT ship while production source is modified. Harness-only V-7 HOLDS is non-negotiable; if production singleton restructure is needed, that is a separate Fuda with its own risk-threshold sign-off.

Empirical validation: AK-358 D1303 — Configs A (default 6/10 + randomised 0/10), B/C explored, Config D' shipped at 28-40s runtime (3-5× the §03 30s soft budget). Three gates honoured: (a) PR title BEST-AVAILABLE tag + Fuda §10 documented STOP; (b) AK-365 per-file afterAll singleton-cleanup rollout P3 3pt filed by D1306 smoke; (c) Tobi-san decision 4.92 ratified standard-squash merge with rationale captured in PR title + AK-365 routed for real-fix. Per CP#23 §10.4 + Standing Decision 4.92.

Companion pattern: §6.0.6 V-3 Ladder Randomised-Order Pre-Lock Discipline (prerequisite — ensures STOP-after-D fires only on genuine ladder exhaustion, not stochastic-mask false-passes that would mask a viable earlier rung).


6.1 Gate 1 — Write-time (Semgrep MCP)

Trigger: Claude Code post_write_code hook on any file matching a source-language extension.

Execution: The Semgrep MCP plugin spawns semgrep --config auto --config ./cipher-shinobi/semgrep-rules/ scoped to the just-written file. SAST, SCA, and secrets rulesets run in a single pass.

Finding handling:

  • CLEAN: agent proceeds.
  • finding: agent is prompted to regenerate or escalate. No silent dismissal; any finding the agent cannot auto-resolve escalates to LISA, who dispatches Gray Fox for adjudication.
  • dismissal: only via Gray Fox adjudication, recorded in the .semgrepignore with a rationale comment.

Custom rule set (Gray Fox, Sprint 0) — mandatory coverage:

RuleEnforcesExample violation
cipher-shinobi.raw-twin-disciplineRaw-twin requirement for write-path MCP toolsNew tool in mcp/index.ts that takes structured fields without a *_raw twin
cipher-shinobi.dispatch-id-requiredwrite_psychic_cache* calls carry dispatch_id inside dispatch contextCache write in an agent handler with no dispatch_id arg
cipher-shinobi.no-hardcoded-gateway-urlGateway URL read from env, not hardcodedconst url = "http://the gateway endpoint (tailnet-internal)" literal
cipher-shinobi.no-silent-catchCatch blocks must log or rethrowcatch (e) {}
cipher-shinobi.no-hardcoded-credentialsNo API keys, tokens, passwords in sourceLong high-entropy string literals
cipher-shinobi.assemble-context-feedback-pairEvery assemble_context followed by context_feedbackAssembly with no feedback call in same handler
cipher-shinobi.fcf-filename-checkNew files under vault-mirror paths match FCF segment grammarCS.AK.LISA.Wrongtype.Foo.md

Author: Gray Fox, dispatched in Sprint 0 §6.4.


6.2 Gate 2a — Pre-push coverage verification (v1.2.0 — Claude-native)

Trigger: git pre-push hook, first in sequence.

Execution: vitest run --coverage on the repo, with LCOV output parsed against the per-repo threshold. Test generation is not a pre-push concern — tests are authored by genji sub-agent dispatches during Phase B of each cold-migration sprint (see §8.2 Phase B). The pre-push hook only verifies that coverage thresholds are met.

Thresholds (D3, LOCKED):

  • Per-repo baseline + 5% floor: the coverage delta vs main must be >= -0 percentage points and >= baseline - 5%. First-pass baseline is set from Sprint 0 analysis; ratcheted to the D3 target at Phase E.
  • 60% new-file floor: any new source file (added in the diff) must reach >= 60% line coverage before the push is allowed.
  • No universal cross-repo 80% rule. Thresholds are per-repo and derived from measured baselines, not imposed.

Spot-check gate (Phase B): Tobi-san reviews a 10% stratified random sample of genji-authored tests. Pass rate >= 80% accepts the batch; <80% rejects and escalates to Raiden for quality investigation (see R2).

Mutation score gate (Phase B): Stryker mutation testing (@stryker-mutator/core + @stryker-mutator/vitest-runner) runs on the generated test suite. Acceptance criterion: mutation score >= 70%. Below threshold triggers targeted test refinement by genji before Phase B close.


6.3 Gate 2b — Pre-push (CodeRabbit CLI)

Trigger: git pre-push hook, second in sequence (after 2a succeeds).

Execution: coderabbit --agent --base main runs the Pro-tier agent loop on the uncommitted diff. CodeRabbit reads CLAUDE.md for governance context (Pro requirement). Raiden drives the loop.

Scope boundary (D2, LOCKED): CodeRabbit owns logic, style, architecture. Genji sub-agent dispatches own test generation; vitest run --coverage owns coverage enforcement; Stryker owns test quality validation (mutation score). CodeRabbit reviews genji-authored tests as part of the diff — the tests are not exempt from logic review.

Iteration contract:

  • Up to 3 iterations.
  • Each iteration: CodeRabbit runs → Raiden triages → Genji fixes → re-run.
  • Exit states:
    • CLEAN: no BLOCKING findings remain → proceed.
    • BLOCKED: unresolvable BLOCKING finding → halt push, escalate to Tobi-san.
    • CAPPED: 3 iterations exhausted → push proceeds with outstanding findings documented and deferred with rationale in the sprint migration log.

Latency (R3): 7-30+ minutes per iteration is accepted as the tax for local review. Scoped to --base main (diff only). For large migration branches, Gate 2b runs in a dedicated Raiden session, not on the sprint critical path.


6.4 Gate 3 — Server-side (GitHub Actions + CodeRabbit App + CODEOWNERS)

Trigger: pull_request event against main.

Workflow file (single shared .github/workflows/clean-code-gate3.yml deployed to all 5 repos):

name: Clean Code Gate 3
 
on:
  pull_request:
    branches: [main]
 
jobs:
  gate3:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # needed for Semgrep --baseline-ref
 
      - name: Install deps
        run: {{per-repo install command}}
 
      - name: Lint
        run: {{per-repo lint command}}
 
      - name: Typecheck
        run: {{per-repo typecheck command}}
 
      - name: Test with coverage
        run: {{per-repo test command emitting lcov.info}}
 
      - name: Enforce coverage threshold
        run: node .github/scripts/enforce-coverage-threshold.js
 
      - name: Semgrep CI
        uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/default
            ./cipher-shinobi/semgrep-rules/
          baseline-ref: main
 
      - name: Check new code has tests
        uses: ./.github/actions/check-new-code-has-tests
        with:
          exceptions-file: .clean-code-exceptions.yaml

Required status checks (branch protection):

  • Lint, Typecheck, Test with coverage, Enforce coverage threshold, Semgrep CI, Check new code has tests.
  • CodeRabbit (from the GitHub App's status check).

Branch protection settings (all 5 repos):

  • Require linear history (true).
  • Require signed commits (true).
  • Restrict force pushes (true).
  • Restrict deletion (true).
  • Required status checks: all from the list above.
  • Require review from CODEOWNERS (true).
  • Require branches to be up to date before merging (true).
  • Do not allow admin bypass (true).

CODEOWNERS (per-repo, at root):

* @the OS core repo

CODEOWNERS review workflow (D4):

  • Tobi-san opens PR under onotobi identity.
  • LISA review is dispatched: a manual-trigger dispatch during cold-migration, with webhook or scheduled-sweep as post-go-live options.
  • LISA runs Gate 2a + 2b tools under the the OS core repo identity locally, posts findings + approval via GitHub API or CodeRabbit CLI under her own handle, signs any review comments with her PGP key.
  • Merge button unlocks only after the OS core repo approval is recorded.

6.5 No-new-code-without-tests Action (Q4 layered enforcement)

Authoritative enforcement: Gate 3 CI action .github/actions/check-new-code-has-tests/ (authored Sprint 0 §6.9, deployed Sprint Final §8.3). Logic:

  1. Enumerate files added in the PR diff (not modified, added) under each repo's source roots.
  2. For each new source file, check for a corresponding test file at a conventional path: test/**, __tests__/**, tests/**, *.test.ts|tsx|js, *.spec.ts|tsx|js, *_test.py, etc. Language- and repo-configurable.
  3. Consult .clean-code-exceptions.yaml — any file listed is exempt.
  4. Emit a failure listing untested new files with a remediation hint pointing at the exceptions file.

Advisory coaching: .coderabbit.yaml path instruction (reviews.path_instructions) coaching on test quality — weak assertions, missing edge cases, badly-placed tests. Advisory only; the authoritative gate is the CI action.

Explicitly NOT a pre-commit hook: trivially bypassable with --no-verify. Q4 resolution rejects this approach.


07. TOOL STACK

7.1 Install Matrix

ToolInstall commandConfig locationPipeline gate
Semgrep MCP plugin/plugin install semgrep (Claude Code); or semgrep-plugin:setup_semgrep_plugin skillPer-session MCP hookGate 1
Semgrep CLIbrew install semgrep (macOS)cipher-shinobi/semgrep-rules/ in each repoGate 3
analyze-coverage-mcp~/.config/claude/mcp.json entry: npx -y @sofia-open-source/analyze-coverage-mcpGlobal Claude Code MCP configGates 1/2/3 visibility
Stryker mutation testingnpm install -D @stryker-mutator/core @stryker-mutator/vitest-runner (per repo)stryker.config.mjs per repoPhase B quality validation
CodeRabbit CLIPer vendor docs, Pro tierLocal config + Pro tier credsGate 2b
CodeRabbit GitHub AppGitHub App install per repo, Pro tier.coderabbit.yaml per repoGate 3
GitHub Actions workflow.github/workflows/clean-code-gate3.yml per repoRepo-committedGate 3
Q4 CI action.github/actions/check-new-code-has-tests/ per repoRepo-committedGate 3
Pre-push hook templateShell script at artefacts/code/hooks/pre-push-clean-code-pipeline.sh; symlinked or copied into each repo's .git/hooks/pre-push on the workstationWorkstation-local per repoGates 1 + 2 orchestration
Dependency map generatornpx madge (TS/JS); pydeps (Python); manual annotation (shell). CI step scripts/generate-depmap.js per repodepmap.yaml + DEPMAP.md at repo rootGate 0 (Fuda scoping — yoshimitsu reads depmap for dependency impact analysis)
fuda skill~/.claude/skills/fuda/ — Claude Code skill, manual /fuda + auto-triggersSkill definition in SKILL.mdGate 0 orchestration (pipeline entry point)

7.2 .clean-code-exceptions.yaml schema

# .clean-code-exceptions.yaml
# Files exempt from the no-new-code-without-tests gate (Q4 decision).
# Every exception requires a reason. Reviewers see these in the PR diff and can challenge them.
 
exceptions:
  - path: src/types/global.d.ts
    reason: Type-only declarations, no runtime logic to test
    added: 2026-04-10
 
  - path: src/generated/api-client.ts
    reason: Auto-generated from OpenAPI spec; tests cover the generator, not the output
    added: 2026-04-10

Field contract:

  • path (string, required): repo-root-relative path to the exempt file.
  • reason (string, required): non-empty, human-readable rationale.
  • added (ISO date, required): date the exception was committed. Used for post-go-live audit.

Validation: the Gate 3 action rejects malformed YAML with a clear error; CodeRabbit App is instructed to surface every added within the last 30 days for explicit reviewer scrutiny.


7.3 Rejected / out-of-stack tools

ToolStatusRejection reason
qodo-cover standaloneRejected10 months stale per D46 Intel
Qodo Command CLI (@qodo/command)Deprecated (v1.2.0)CLI discontinued April 2026; test-coverage command removed, migrated to IDE-only Qodo Gen extension not invocable by sub-agents. Replaced by Claude-native test generation (genji + analyze-coverage-mcp) + Stryker mutation testing. See ADR-5.
codecov-mcpRejectedArchived Aug 2025 per D46 Intel
PR-AgentRejectedLegacy, superseded by CodeRabbit per Qodo docs
Pre-commit hook for no-new-code-without-testsRejectedTrivially bypassable with --no-verify (Q4 decision)
CodeRabbit Finishing Touches (@coderabbitai generate unit tests)DeferredTrigger-based re-evaluation post-go-live (Q2): enable only if Gate 2a coverage escape rate >5% over first 30 days
Mutation testing (laurigates/claude-plugins)DeferredScoped post-go-live pilot on highest-value repo (Q1)

08. COLD-MIGRATION METHODOLOGY

8.1 Sprint 0 — Foundation

Goal: establish tool stack, baseline every repo, produce sprint queue. No migration work happens here. Full checklist in plan §6.

Key tasks:

  1. Install all six tools and verify on Tobi-san's workstation.
  2. Provision the the OS core repo GitHub identity: collaborator on all 5 repos with Write access, PGP signing key registered, 2FA backup codes stored in Tobi-san's password manager, LISA review workflow documented and canary-tested.
  3. Run baseline analysis on all 5 repos: coverage %, Semgrep findings, lint/typecheck state, untested-file count, repair-effort bracket (Q5 rule).
  4. Rank the sprint queue by complexity_score = untested_file_count × (1 + semgrep_finding_count / 100) × language_complexity_multiplier. Lisa-OS is Sprint 1 regardless of score (D5 override).
  5. Gray Fox sub-dispatch: author custom Semgrep rule set under cipher-shinobi.semgrep-rules/, smoke-test against all 5 baselines.
  6. Create migration/clean-code-pipeline branch on each repo.
  7. Verify every repo has a CLAUDE.md readable by CodeRabbit Pro; author minimal ones for application repos that lack one.
  8. Draft (not yet install) the pre-push hook template.
  9. Author the Q4 CI action (.github/actions/check-new-code-has-tests/) and .clean-code-exceptions.yaml schema. Smoke-test against one repo with zero false positives.
  10. (v1.1.0) Generate baseline depmap.yaml + DEPMAP.md for all 5 repos using language-appropriate tooling (madge --json for TS/JS, pydeps for Python, manual annotation for shell). Validate each depmap.yaml parses cleanly and covers all three dependency categories (static, runtime, cross-repo). Sprint 0 exit criterion.
  11. (v1.1.0) Author the fuda skill (~/.claude/skills/fuda/) per the D9 specification. The skill must: identify target repos, read depmap.yaml, dispatch yoshimitsu (draft), post Fuda to Linear, dispatch raiden + gray-fox for parallel review, apply fast-track threshold, check risk threshold, and return fuda_id for report_dispatch. Smoke-test against a dry-run Fuda for the Lisa-OS repo. Sprint 0 exit criterion.

Sprint 0 exit criteria: all 11 items above signed off. Plan §6 enumerates items 1-9 as a checklist; items 10-11 are v1.1.0 additions that extend the Sprint 0 exit gate.


8.2 Sprints 1-N — Per-repo Five-Phase Template

Each sprint targets one repo on its migration/clean-code-pipeline branch and follows the same five phases.

Phase A — Full Semgrep scan + adjudication

  • Run full-repo Semgrep with custom rules enabled.
  • Classify findings: CRITICAL (fix now), HIGH (fix in sprint), MEDIUM (follow-up ticket), LOW (.semgrepignore with rationale).
  • Genji fixes CRITICAL/HIGH, Raiden verifies.
  • Gray Fox adjudicates ambiguous findings.
  • Phase A done when full scan returns zero CRITICAL/HIGH on the migration branch.

Phase B — Claude-native test generation + spot-check + mutation validation (v1.2.0)

  • LISA dispatches genji in 3-4 batches on the migration branch, output to child branch migration/clean-code-pipeline/genji-tests:
    • Batch 1: security-tagged modules (identified by tags: [security] in depmap.yaml; ~5 per repo).
    • Batch 2: high-import modules (top 10 by import count per depmap.yaml static graph).
    • Batches 3-4: remaining modules, split by directory or language boundary.
    • Shadow Clone parallelism within each batch (max 3 clones per batch).
  • Genji uses analyze-coverage-mcp to identify coverage gaps and targets test generation at uncovered regions.
  • Raiden reviews the summary after each batch.
  • Stryker mutation testing runs on each batch: mutation score >= 70% is the acceptance criterion. Below threshold triggers targeted test refinement by genji.
  • Tobi-san spot-checks 10% stratified random sample (unchanged).
  • < 80% pass rate: reject batch, refine with genji, escalate to Raiden.
  • >= 80% pass rate: accept, merge genji-tests into migration branch.
  • Coverage re-measured; must meet per-repo D3 target.

Phase C — CodeRabbit CLI review loop

  • coderabbit --agent --base main on the migration branch (now including genji-authored tests).
  • Raiden triages: fix-now / fix-in-loop / defer-with-rationale.
  • Max-3 iteration loop.
  • Exit: CLEAN, BLOCKED (escalate), CAPPED (document deferrals).

Phase D — Gate 3 canary PR

  • Open canary PR from migration/clean-code-pipeline to main. Do not merge yet.
  • All GitHub Actions required checks must pass green.
  • CodeRabbit App review surfaces no new CRITICAL/HIGH findings (any that surface = Gate 2 escape, logged as lesson-learned, fixed, re-run).
  • Phase D done when canary is green end-to-end.

Phase E — Merge to main

  • Tobi-san final review.
  • LISA (the OS core repo) CODEOWNERS approval.
  • Rebase-merge per branch-protection convention.
  • Ratchet .coderabbit.yaml coverage threshold from "lenient" to D3 target.
  • Update sprint queue: mark DONE, record final coverage %, credit burn, finding counts per severity.
  • Write intel cache entry with sprint outcome (tags: engineering, deliverable).

8.3 Sprint 1 Pilot Gate (Q3 MANDATORY)

Sprint 1 targets Lisa-OS (D5). On Phase E close, yoshimitsu dispatches a retrospective:

  • What worked, what broke, tool quirks.
  • Revised API cost projection from real data (genji dispatch count, Stryker mutation scores).
  • Revised per-sprint effort estimate.
  • Recommended adjustments to Sprints 2-N.
  • Pivot options: proceed unchanged, proceed with tweaks, pivot, scope-reduce or abandon remaining repos.

Explicit Tobi-san greenlight required before Sprint 2 can kick off. Not a cascade; a gate.


8.4 Broken Baseline Decision Rule (Q5)

Applied during Sprint 0 baseline analysis; per-repo bracket recorded.

Repair effort vs sprint budgetResponse
<20% — minor noise, trivial fixesScope-expand within the sprint: fix-compile-first, then Phase A-E normally
20-50% — broken imports, moderate refactor neededDedicated pre-migration repair sub-sprint (scope: just get the repo building), then normal 5-phase sprint
>50% — fundamentally brokenEscalate to Tobi-san. Options: retire, refactor as separate mission, or deprioritise indefinitely

Thresholds recalibrated after Sprint 1 retro if pilot data suggests otherwise.


8.5 Sprint Final — Go-live

Runs after all per-repo sprints close. Pipeline enablement only; no migration work.

  1. Enable branch protection on main for all 5 repos (linear history, signed commits, no admin bypass, required status checks, CODEOWNERS).
  2. Commit CODEOWNERS (* @the OS core repo) to each repo.
  3. Deploy .github/workflows/clean-code-gate3.yml to each repo.
  4. Deploy .coderabbit.yaml to each repo with per-directory path instructions + Q4 advisory test-quality path instruction. Finishing Touches DEFERRED per Q2.
  5. Install pre-push hook template on Tobi-san's workstation for every local clone.
  6. Author operator runbook at CS.AK.LISA.Docu.CleanCodePipelineRunbook covering normal flow, emergency bypass, troubleshooting, cost monitoring, rollback. Link from each repo's CLAUDE.md.
  7. API cost evaluation across all sprints. Genji dispatch count, Stryker mutation scores, and total Claude API spend recorded.
  8. Pipeline declared live.

09. GOVERNANCE INTEGRATION

9.1 CLAUDE.md Integration

CLAUDE.md is the master governance entrypoint. Post-TechSpec, it MUST gain:

  • Clean Code Pipeline row in the On-Demand Context table pointing at this TechSpec file.
  • CleanCodePipelineRunbook row once the Sprint Final runbook ships.
  • Raw Twin Discipline cross-link: the pipeline enforces Raw Twin Discipline via a custom Semgrep rule; the reverse link from Raw Twin Discipline doc to the pipeline is added.

Lisa-OS repo's root CLAUDE.md is the same file as the vault's (mirrored via ADR-1). Gate 2b CodeRabbit CLI reads it on every pre-push; Gate 3 CodeRabbit App reads it on every PR.


9.2 Agent Roster Integration

The pipeline binds tightly to the CyberShinobi roster:

AgentPipeline role
Yoshimitsu(v1.1.0) Gate 0: Fuda drafter — reads depmap.yaml, fills all required Fuda sections, responds to reviewer flags (max 2 revision rounds). Sprint queue re-ranking, Sprint 1 pilot retrospective, API cost monitoring, mid-mission pivots.
Raiden(v1.1.0) Gate 0: Fuda completeness + soundness reviewer — reviews every Fuda for section completeness, scope-depmap alignment, rationale coherence. Verification lead. Owns Gate 2a spot-checks, Gate 2b iteration loop, Gate 3 canary verification. Busiest agent in the mission.
Gray Fox(v1.1.0) Gate 0: Fuda security reviewer — reviews only Fuda touching security-tagged modules (tags: [security] in depmap.yaml); skipped on fast-track. Semgrep custom rule authoring (Sprint 0 sub-dispatch). Finding adjudication (Gate 1 escalation, Gate 3 ambiguous).
GenjiImplementation. All Semgrep fix PRs, all Gate 2b iteration fixes, test refinement, Q4 CI action authoring. Receives fuda_id on implementation dispatches (v1.1.0).
SmokeVault governance for runbook, ArtefactMap updates for hook + rules + workflow + Q4 action, propagation of TechSpec downstream impacts.
LISA(v1.1.0) Gate 0: coordinator — dispatches yoshimitsu/raiden/gray-fox, posts Fuda to Linear, manages review cycle. Never drafts, never reviews. CODEOWNERS approval surface via the OS core repo, cross-agent context seeding, Tobi-san coordination, dispatch lifecycle management.

Agent prompts do not need to be updated for pipeline awareness at this stage — each agent already reaches for its binding reference (Genji/Raiden → CodeDisciplineProtocol; Gray Fox → SecurityOperations) and the pipeline is invoked through those protocols rather than embedded in prompts.


9.3 Psychic Cache Integration

Pipeline state is tracked in the cache as the mission progresses. Specifically:

  • Sprint boundaries: state_change entries mark Sprint 0 → Sprint 1 transitions, pilot gate decisions, Sprint Final go-live.
  • Gate escapes: intel entries record every Gate 2 escape surfaced at Gate 3, tagged engineering + risk, with the sprint ID and lesson learned.
  • Credit burn: intel entries per sprint close, tagged engineering + pricing, with credits consumed and the delta vs projection.
  • Rule adjudications: Gray Fox decision entries whenever a Semgrep finding is dismissed, with the rationale.

Dispatch lifecycle: every migration sprint is a dispatch with plan_slug: clean-code-pipeline-migration. Cache writes during a sprint dispatch carry the dispatch_id from report_dispatch.


9.4 Dispatch Lifecycle Integration

Each plan-milestone row in §11 of the plan (Sprint 0, Sprint 0 sub — Gray Fox, Sprint 1 pilot, pilot retro, Sprint 2-N, Sprint Final) is a future dispatch. LISA fires report_dispatch with plan_slug: clean-code-pipeline-migration. The gateway-assigned dispatch ID flows to every subsequent cache write and progress report for that dispatch.

The pipeline does not interact with dispatch gateway internals during steady-state operation. It is a quality gate on code that happens to include the gateway.


9.5 LisaOSMap Change Impact Matrix

Post-TechSpec, CS.AK.LISA.Docu.LisaOSMap.md §9b gains a new row:

If this changesUpdate thesePriority
Clean Code Pipeline TechSpecCleanCodePipelineRunbook (once authored), .github/workflows/clean-code-gate3.yml in all 5 repos, cipher-shinobi/semgrep-rules/ set, .coderabbit.yaml per repo, pre-push hook template, CLAUDE.md On-Demand Context tableHIGH

Propagation is enumerated in §18.


9.6 Linear Integration Protocol (v1.1.0)

The pipeline uses Linear as the external tracking surface for Fuda lifecycle and dispatch activity. Zero manual tracking overhead — if a dispatch fires, Linear knows.

9.6.1 Fuda = Linear Issue

Every Fuda is created as a Linear issue before any code is written. The issue follows the standard Linear issue format established across all Cipher Shinobi tickets (reference: T0-379, T0-448).

Issue description format: the description MUST begin with the standard header template before any Fuda-specific content:

#### Collaborators
 
* {agent/person} ({role})
 
#### Job Resources
 
* {links to depmap.yaml, governance docs, reference materials}
 
#### Job Output
 
* _(empty at creation — filled when implementation PR is merged)_
 
---
 
{Fuda body: all six required sections from §6.0.1}

For Fuda specifically:

  • Collaborators lists the agents involved: LISA (orchestrator), Yoshimitsu (drafter), Raiden (reviewer), Gray Fox (security reviewer, if applicable)
  • Job Resources links to depmap.yaml, CodeDisciplineProtocol, RawTwinDiscipline, or other reference docs consumed during scoping
  • Job Output is empty at creation; updated with PR link(s) when implementation merges

Metadata fields — ALL required at issue creation time:

FieldRequirement
LabelsTicket Type label (Task for all Fuda) + one domain label from Ninjutsu/Genjutsu/Sagyojutsu groups. Only ONE label per group — SEC-CYBER and DEV-AI are exclusive within Ninjutsu.
EstimateStory points (fast-track: 1-2, standard: 3-5, risk-threshold: 5-8)
AssigneeTobi-san (Kage — owns all Fuda approvals)
ProjectMapped from mission namespace (see table below)
Priority0-4 scale (0=None, 1=Urgent, 2=High, 3=Normal, 4=Low). Default: 3 (Normal); risk-threshold Fuda default to 2 (High).
StatusPlanned (maps to Fuda lifecycle state: Draft)

Project assignment follows the existing mission project structure:

Mission namespaceLinear project
CS.AK.LISALisa-OS / LISA infrastructure
CS.AK.ClientAClientA mission work
CS.YB.*Per-client Yurei Butai projects

The fuda skill creates the Linear issue via the linear-server MCP tools (save_issue) and returns the issue ID as fuda_id. The skill is responsible for populating all metadata fields and the header template — no manual intervention required.

9.6.2 Dispatches = Automatic Linear Comments

The gateway posts a structured comment to the Fuda's Linear issue on two lifecycle events:

On report_dispatch:

[DISPATCH #{dispatch_id}] Agent: {agent_name}
Task: {task_description}
Mission: {mission_namespace}
Status: Started
Milestones: {milestone_count}

On report_complete:

[DISPATCH #{dispatch_id}] Agent: {agent_name}
Status: {confidence} | Domain affinity: {domain_affinity}
Result: {result_summary}
Duration: {duration}

Comments are posted via the linear-server MCP save_comment tool, called from the gateway's dispatch handlers. The gateway reads fuda_id from the dispatch record to determine the target issue.

9.6.3 Gateway Schema Extension

fuda_id is added as a required field on ReportDispatchInputSchema in dispatch/types.ts:

// dispatch/types.ts — ReportDispatchInputSchema extension (v1.1.0)
fuda_id: z.string().describe(
  'Linear issue ID for the Fuda (scoped change contract). '
  + 'Gateway posts automatic comments to this issue on dispatch and completion.'
),

The gateway validates fuda_id is non-empty and posts the dispatch comment before returning the dispatch_id to the caller. If the Linear comment fails (network, auth), the dispatch still proceeds — Linear integration is best-effort, not a dispatch blocker. Failures are logged as warnings.

9.6.4 Issue State Lifecycle

Linear issue state tracks the Fuda lifecycle end-to-end:

Fuda stateLinear statusTransition trigger
DraftPlannedFuda created by fuda skill
Under ReviewPlannedReview dispatches fire (tracked in comments)
ApprovedIn ProgressAll reviewers approve; fuda skill updates status
In ProgressIn ProgressImplementation dispatch fires with fuda_id
MergedMergedPR merged to main
DoneDoneSprint close / manual confirmation

State transitions from Planned to In Progress are automated by the fuda skill. Transitions to Merged and Done are manual or driven by future GitHub webhook integration (post-go-live enhancement).


10. SECURITY POSTURE

10.1 Secrets Handling

No secret ever lands in any of the 5 repos, committed or not.

  • API keys (CodeRabbit): stored in CodeRabbit local creds on the workstation only; never in repo files. Semgrep secrets rule set catches any accidental commit at Gates 1 and 3.
  • Gateway URL: read from environment variables; never hardcoded. Enforced by cipher-shinobi.no-hardcoded-gateway-url custom rule.
  • Claude Code settings.json: the Lisa-OS repo's .claude/settings.json mirror is a secrets-scrubbed variant. The working-copy settings.json on Tobi-san's workstation contains permissions, MCP tokens, and hook permissions that do not commit. A Sprint 0 script produces a scrubbed variant (strips any field matching *token*, *key*, *secret*, *password*) that is what gets committed.
  • PGP signing keys: private keys stay on the workstation. Public keys registered on both onotobi and the OS core repo GitHub accounts.
  • GitHub tokens: gh auth login stores tokens in the macOS keychain; never in repo files.

10.2 .gitignore Scope (Lisa-OS)

Mandatory entries for the Lisa-OS repo:

# node runtime
node_modules/
*.log

# environment and secrets
.env
.env.*
*.key
*.pem
*.pfx
credentials.json

# runtime data
.data/
*.sqlite
*.sqlite-*

# Claude Code runtime state
.claude/projects/
.claude/history/
.claude/cache/

# OS
.DS_Store

# IDE
.vscode/
.idea/

# build artefacts
dist/
build/
*.tsbuildinfo

# agent memory (written at runtime)
.claude/agent-memory/*/
!.claude/agent-memory/*/MEMORY.md   # (decision: track indexes only, not content)

10.3 Two-Factor Authentication

  • onotobi: 2FA mandatory (org-enforced).
  • the OS core repo: 2FA mandatory; backup codes stored in Tobi-san's password manager (not only on the enrolling device); TOTP app on Tobi-san's phone.

10.4 PGP-signed Commits

  • Required by branch protection on all 5 repos.
  • onotobi key already registered.
  • the OS core repo key registered in Sprint 0.
  • Local git config uses commit.gpgsign=true per account.

10.5 No Admin Bypass

Branch protection is configured to apply to repo admins including onotobi. Emergency bypass is explicitly disallowed at the GitHub level; the operator runbook documents the temporary-disable-and-re-enable procedure as the sanctioned path for legitimate emergencies, with a post-incident review obligation.

10.6 Threat Surface — Lisa-OS Specifics

The Lisa-OS repo carries three categories of elevated sensitivity over the application repos:

  1. Agent prompts (.claude/agents/*.md) — prompts are attack surface. Prompt-injection content reaching an agent is a security concern. The repo's public visibility is private, and Semgrep custom rule cipher-shinobi.no-prompt-injection-markers (post-go-live enhancement) scans prompt files for known injection patterns.
  2. MCP gateway code (memory_gateway/server/) — runtime that agents call. Any code change here is subject to the full three-gate pipeline including the Raw Twin Discipline rule.
  3. Governance docs (CS.AK.LISA.Docu.*.md) — the source of truth for enforcement. A malicious edit to CLAUDE.md could attempt to disable rules. CODEOWNERS on the governance/ path set is a post-go-live enhancement that adds LISA as a required reviewer on governance changes (she already is the sole reviewer under D4's * @the OS core repo pattern, but a path-scoped entry would surface governance changes with a distinct review flag).

11. OBSERVABILITY

11.1 GitHub Actions Dashboards

Each repo's Actions tab provides the canonical dashboard:

  • Workflow run history, green/red rate, average duration per job.
  • Failed-step drill-down.
  • Per-PR status check summary.

For cross-repo rollup, a post-go-live enhancement is a simple aggregator script that queries the GitHub API for the last 30 days of clean-code-gate3.yml runs across all 5 repos and publishes a daily summary to the dispatch dashboard. Not in this TechSpec's scope; flagged as a post-go-live enhancement.

  • analyze-coverage-mcp provides per-repo, per-file coverage visibility on demand.
  • The Phase E ratchet records the per-repo D3 target in .coderabbit.yaml; diffs to this file are the historical ledger of threshold movement.
  • Post-go-live enhancement: a weekly launchd job that runs analyze-coverage-mcp on every repo and writes the totals to an intel cache entry tagged engineering + data. Not in this TechSpec.

11.3 Security Alerts

  • GitHub's native Dependabot and Secret Scanning alerts are enabled on all 5 repos (free for private repos on the Free plan).
  • Semgrep CI on Gate 3 publishes SARIF output that surfaces in GitHub's Security tab.
  • Any Gate 1 or Gate 3 finding that Gray Fox dismisses is logged to .semgrepignore with a rationale comment — that file is the audit trail.

11.4 Pipeline Health Signals

The operator runbook (Sprint Final §8.6) documents the health signals Tobi-san and LISA check weekly during the first month post-go-live:

  • Gate 3 green rate (target: >= 95% per week).
  • Gate 2 → Gate 3 escape rate (target: 0 escapes).
  • SKIP_GATES=1 usage (target: 0; any use triggers post-incident review).
  • API cost (genji dispatches) vs projection.
  • CodeRabbit CLI average iteration count (proxy for code-quality trend).

12. NON-FUNCTIONAL REQUIREMENTS

12.1 Performance

NFRTargetPercentileConditions
NFR-P1 Gate 1 write-time scan< 2 sp95per single-file write
NFR-P2 Gate 2a vitest run --coverage< 3 minp95typical diff of 1-20 files
NFR-P3 Gate 2b CodeRabbit CLI iteration< 10 minp50, accepted 30 min p95diff-scope, Pro tier
NFR-P4 Gate 3 Actions total< 15 minp95typical PR

12.2 Availability

  • Pipeline is not a production service. Target: "no gate broken > 1 working day without repair or documented suspension".
  • GitHub Actions availability follows GitHub's SLA.

12.3 Security

  • No secret ever committed to any repo (see §10).
  • Every PR to main passes full Gate 3 (required status checks).
  • No admin bypass on branch protection.
  • Signed commits required.
  • 2FA mandatory on both onotobi and the OS core repo.

12.4 Maintainability

  • Custom Semgrep rules live in cipher-shinobi/semgrep-rules/ with per-rule documentation and a false-positive budget.
  • Operator runbook is authoritative for troubleshooting; all known failure modes (Semgrep false positives, CodeRabbit timeouts, Stryker configuration issues, Actions flakiness) documented.
  • .clean-code-exceptions.yaml is committed and reviewed; LISA can audit exemptions weekly.
  • LisaOSMap §9b tracks TechSpec-driven propagation obligations.

12.5 Cost

  • NFR-C1 Claude API cost for genji test-generation dispatches during cold-migration: estimated $20-40 per repo one-shot (total $100-200 across 5 repos). Stryker is free open-source.
  • NFR-C2 Post-go-live test generation: zero recurring vendor cost. Genji dispatches for new test authoring use the existing Claude API allocation. Stryker runs locally at zero cost.
  • NFR-C3 GitHub Actions minutes: stays within free private-repo allotment (2,000 min/month at time of drafting).
  • NFR-C4 CodeRabbit Pro: existing org subscription, no additional spend.

13. ACCEPTANCE CRITERIA

13.1 Pipeline-Level Acceptance (Sprint Final exit)

IDCriterionVerification
AC-P1All 5 repos have branch protection enabled with the full rulesetGitHub API check per repo
AC-P2All 5 repos have .github/workflows/clean-code-gate3.yml deployed and green on a canaryActions run green in each repo
AC-P3All 5 repos have the CodeRabbit App installed and .coderabbit.yaml committedRepo file check + app integration check
AC-P4the OS core repo is sole CODEOWNER on all 5 repos; canary PR requires her approvalRepo CODEOWNERS + test PR
AC-P5Pre-push hook installed on Tobi-san's workstation for all 5 repo clonesHook file present, fires on canary push
AC-P6Operator runbook filed at CS.AK.LISA.Docu.CleanCodePipelineRunbook and linked from each repo's CLAUDE.mdFile present + link check
AC-P7Custom Semgrep rules (cipher-shinobi/*) deployed and passing smoke test in all 5 reposSmoke test run per repo
AC-P8API cost + mutation score evaluation complete; genji dispatch count, Stryker mutation scores, and total spend recorded in the logCost table filled; evaluation intel written
AC-P9Q4 action deployed in all 5 repos; canary PR adding a test-less file is blockedPR test case
AC-P10No secret detected in any repo by Semgrep secrets rulesetClean Gate 3 run

13.2 Per-Repo Sprint Acceptance (Sprints 1-N)

IDCriterionVerification
AC-S1Phase A: full Semgrep scan returns zero CRITICAL/HIGH on the migration branchSemgrep output
AC-S2Phase B: Tobi-san spot-check pass rate >= 80%Spot-check record in migration log
AC-S3Phase B: coverage meets per-repo D3 target (baseline + 5% floor, 60% new-file)analyze-coverage-mcp output
AC-S4Phase C: CodeRabbit CLI exits CLEAN or CAPPED-with-rationaleCLI output + migration log
AC-S5Phase D: canary PR is green on all required Gate 3 checksActions run
AC-S6Phase E: merge complete; .coderabbit.yaml ratcheted to D3 targetFile diff; Gate 3 re-run on main
AC-S7Sprint intel written to cache with outcome, credit burn, findings countsCache row

13.3 Pilot Gate Acceptance (Sprint 1 only)

IDCriterionVerification
AC-G1Sprint 1 closes on Phase E (Lisa-OS merged)Git log
AC-G2Yoshimitsu retrospective dispatch fires and produces revised projectionsRetrospective output in cache
AC-G3Tobi-san records explicit greenlight before Sprint 2Decision cache entry

14. RISK REGISTER

Carries forward the 12 risks from plan §9; adds R13-R15 introduced by the Lisa-OS-first architectural decision (D5 + ADR-1).

#RiskLIMitigation
R1LLM-generated test quality variance — genji-authored tests may miss edge cases a dedicated tool would catchMMStryker mutation score >= 70% gate catches weak assertions. Raiden quality review on every batch. Tobi-san 10% spot-check. Tusk ($50/dev/month, CLI+API) as commercial fallback if Claude-native quality proves insufficient at Sprint 1 retrospective.
R2Generated tests are low-quality (execute lines without asserting behaviour)MH10% spot-check gate at Phase B. Reject batches below 80% pass rate. Post-go-live: mutation-testing skill to catch assertion weakness.
R3CodeRabbit CLI reviews take 7-30+ minutes per iterationHMAccept latency — tax for local review. Scope pre-push to --base main. Large migration diffs run in a dedicated session, not critical path.
R4Semgrep false positives drown signalMMGray Fox custom rules in Sprint 0 with FP budget. LOW findings dismissed in .semgrepignore with rationale. Smoke-test against baselines before Sprint 1.
R5Cold-migration reveals deep architectural issuesMHPhase A Semgrep scan is early warning. Escalate to Tobi-san before Phase B: refactor / defer / scope-reduce.
R6Existing code has broken lint / typecheck / compile stateHMQ5 broken-baseline decision rule: <20% scope-expand, 20-50% repair sub-sprint, >50% escalate.
R7Tobi-san review fatigue during serial executionHHExplicit rest windows. 10% spot-check batch cap. Sprint debriefs reset context. Serial (not parallel) execution.
R8CodeRabbit Pro fails to read CLAUDE.md for some reposLMSprint 0 verification PR. Fallback: inline context in .coderabbit.yaml.
R9MCP client schema cache staleness mid-sprintMLRestart Claude Code at sprint boundaries. Documented in Raw Twin Discipline.
R10Gate 2 pre-push hook bypass via --no-verify becomes habitualMHGate 3 is authoritative; bypass at Gate 2 still fails at Gate 3. Audit log on SKIP_GATES=1.
R11Custom Semgrep rules conflict with registry rulesLMNamespace under cipher-shinobi.*. Smoke-test against all 5 repo baselines before Sprint 1.
R12Post-go-live regression in a repo that was green at Phase ELHOperator runbook rollback procedure. Weekly Gate 3 green-rate check.
R13Lisa-OS scope drift — files added to the repo that should have stayed in the vault (or vice versa)MMSprint 0 script enforces the ADR-1 whitelist; Semgrep custom rule cipher-shinobi.repo-boundary-check (Sprint 0 Gray Fox sub-dispatch) fails Gate 3 if a commit adds a file outside the whitelist. Post-go-live audit monthly.
R14Governance docs drift between the vault copy and the Lisa-OS repo copyMHAuthoritative copy is the Lisa-OS repo; vault copies become symlinks or read-only mirrors post-Sprint 1 (decision formalised at pilot retro). Until then, a Sprint 1 pre-push check diffs the two trees and fails if they diverge.
R15Semgrep custom rule drift — rules cite CLAUDE.md provisions that have since been editedMMRule set carries a SCHEMA_VERSION constant mirroring Raw Twin Discipline §SCHEMA_VERSION. Any edit to a cited CLAUDE.md section requires bumping the rule's version comment. Sprint 1 pilot retro will flesh out the mechanism.

15. DEPENDENCIES AND ASSUMPTIONS

15.1 External Dependencies

  • Stryker mutation testing (@stryker-mutator/core + @stryker-mutator/vitest-runner): npm install, no account or billing needed.
  • CodeRabbit Pro tier active for cipher-shinobi org.
  • GitHub CLI + signed commits configured for both onotobi and the OS core repo.
  • the OS core repo GitHub account — 2FA-enabled, PGP key registered, collaborator on all 5 repos with Write access. Confirmed set up; formal wiring in Sprint 0 §6.8.
  • Semgrep CLI + registry — free, no account required.
  • npm registry access — for Stryker, analyze-coverage-mcp, and dev tooling.
  • LCOV output from each repo's test runner — precondition to coverage enforcement.

15.2 Internal Dependencies

  • Gray Fox — Sprint 0 custom rule authoring; ongoing finding adjudication.
  • Raiden — Per-sprint verification lead (busiest agent).
  • Genji — All implementation dispatches, Q4 action authoring.
  • Smoke — Vault governance, propagation, ArtefactMap updates.
  • Yoshimitsu — Sprint queue, pilot retro, credit burn monitoring.
  • LISA — CODEOWNERS reviewer, cross-agent context, Tobi-san coordination.

15.3 Assumptions

#AssumptionConfidenceNotes
A1Claude API cost for genji test-generation dispatches stays within $20-40 per repo ($100-200 total)HZero vendor billing dependency. D88 evaluation established cost estimate. Stryker is free open-source.
A2Each repo's test runner emits LCOV or can be configured toHStandard feature in Vitest/Jest/pytest.
A3CodeRabbit Pro CLAUDE.md reading works in all 5 reposHSprint 0 verification is belt-and-braces.
A4Tobi-san's review attention sustains 5 serial sprintsMR7 mitigates. Serial, smallest-first, sprint debriefs.
A5Serial is faster than parallel despite wall-clockHBottleneck is review quality, not wall-clock.
A6Custom Semgrep rules authorable from CLAUDE.md without FP explosionMGray Fox smoke-tests before Sprint 1.
A7Branch protection + required status checks sufficient to prevent pre-push bypassMR10 residual risk. Audit log + post-incident review.
A8Lisa-OS governance subset can be mirrored vault→repo without breaking Obsidian browsingMInitial implementation one-way copy; symlink direction decided at Sprint 1 retro. If Obsidian breaks on symlinks, fall back to git-vault sync daemon.
A9Semgrep custom rules survive contact with Lisa-OS's own codebase without needing per-rule exceptionsMIf Lisa-OS itself needs more than 3 rule exceptions to get Phase A green, the rule is mis-scoped and must be rewritten before Sprint 2 touches the application repos.

16. OPEN QUESTIONS

The plan's Q1-Q5 are all resolved and locked (see plan §12). This TechSpec introduces and resolves the Lisa-OS scope question (ADR-1). All three follow-on open questions flagged in v1.0.0 (OQ-1 through OQ-3) were resolved in-turn during Dispatch 48 synthesis by Tobi-san lock-in — resolutions marked inline below. No open questions remain blocking Sprint 0 kickoff.

OQ-1: Authoritative copy direction for governance docs post-Sprint 1 — RESOLVED 2026-04-10

Question: After Lisa-OS migrates, do the CS.AK.LISA.Docu.*.md files' canonical copies live in the vault (with the repo as a mirror) or in the repo (with the vault as a mirror)?

Resolution: Repo authoritative, vault as symlink-mirror. At Sprint 0, governance doc copies, .claude/agent-memory/, and Tier 1 auto-memory files move into the lisa-os repo checkout. The vault's .claude/memory, .claude/plans, and .claude/agent-memory symlinks are re-pointed into the repo checkout so Obsidian reads them through the vault and git sees them through the repo. Sprint 0 includes a spike verifying Obsidian + Google Drive File Stream symlink behaviour on the smallest governance file before full migration — if the symlink boundary misbehaves, fallback is a Sprint 0 bi-directional sync script (pre-push diff gate) with repo still authoritative.

Resolution source: Tobi-san lock-in 2026-04-10 during Dispatch 48 synthesis.

OQ-2: Webhook vs manual vs scheduled LISA review trigger — RESOLVED 2026-04-10

Question: How does a PR trigger a LISA review? Options: (a) manual dispatch, (b) GitHub webhook on pull_request.opened, (c) scheduled launchd sweep.

Resolution: Layered trigger strategy. (a) manual dispatch during cold-migration sprints for tight-feedback control — implemented in Sprint 0 §6.8. (b) GitHub webhook on pull_request.opened → local endpoint → LISA dispatch, activated post-go-live as part of Sprint Final §8.6. (c) launchd scheduled sweep as reliability fallback if the webhook endpoint proves unreliable. All three layers stay operational post-activation — (a) for manual re-requests, (b) as the primary trigger, (c) as the safety net.

Resolution source: Tobi-san lock-in 2026-04-10 during Dispatch 48 synthesis.

OQ-3: Rule-version drift detection mechanism (R15 mitigation) — RESOLVED 2026-04-10

Question: What mechanism detects drift between a Semgrep custom rule and the CLAUDE.md provision it cites?

Resolution: Rule metadata header + CI diff check. Each custom Semgrep rule carries a metadata header declaring (i) the cited governance doc path (repo-relative), (ii) the cited section anchor, (iii) a cite_version SHA256 hash of the cited section's canonical text. A CI step at Gate 3 walks every custom rule, re-hashes the current state of the cited section, and fails the build on mismatch — forcing either a rule update or a deliberate cite_version bump. Gray Fox authors the header schema + CI check as part of his Sprint 0 custom-rule authoring dispatch. Sprint 1 pilot validates the mechanism against Lisa-OS before propagation to application repos.

Resolution source: Tobi-san lock-in 2026-04-10 during Dispatch 48 synthesis.


17. ARCHITECTURAL DECISION RECORDS

ADR-1: Lisa-OS repo scope = memory_gateway/ + .claude/ + governance subset

Status: Accepted (this TechSpec, v1.0.0)

Decision source: Dispatch 48 (2026-04-10), Genji drafting under LISA directive D5 "Lisa-OS migrates first".

Context:

D5 mandates that Lisa-OS is the first repo through the pipeline (Sprint 1 pilot). D5 does not specify which files constitute "Lisa-OS". Four candidate boundaries were on the table: (a) memory_gateway/ only, (b) + .claude/, (c) + .claude/ + governance subset from vault, (d) everything minus user vault content.

The decision matters because the boundary determines:

  • What Semgrep custom rules can enforce (rules cite governance docs — if the docs are outside the repo, the citation cannot be verified).
  • What the pipeline protects (agents evolve as prompts, which live in .claude/agents/ — if they are outside the repo, they are outside enforcement).
  • Secrets risk (user vault content includes operator personal data in ENT files — if they are inside the repo, secrets surface grows).
  • Obsidian workflow impact (Tobi-san browses the vault in Obsidian; anything moved out of the vault or symlinked in can disrupt this).

Decision: Option (c) — the Lisa-OS repo scope is memory_gateway/ + .claude/ (explicit sub-scope: agents/ + agent-memory/ + skills/ + plans/ + settings.json/settings.local.json + Tier 1 auto-memory subset under memory/; see §5.2 for the full list and exclusions) + a governance subset of the vault's CS.AK.LISA.Docu.*.md + CLAUDE.md + ENT.Lisa.Compressed.md + PER.EX.SAG_SYSX.Docu.FileClassFramework.md + PER.EX.NINJ_DEV-AI.Data.ToolsDomainIndex.md.

Memory-scope refinement (v1.0.1, 2026-04-10): the initial v1.0.0 draft was hand-wavy about what .claude/ includes. The refinement — ratified by Tobi-san during Dispatch 48 synthesis — is explicit: .claude/agents/ (already implicit), .claude/agent-memory/ (new inclusion — 7+ agents' domain memory), and Tier 1 auto-memory only (MEMORY.md, feedback_*.md, reference_*.md, project_*.md) are in scope. Excluded: user_profile.md (PII — kept unversioned), Tier 2/3 auto-memory exports (.claude/memory/CS.*.md + .claude/memory/PER.*.md — ≈124 high-velocity files with canonical copies in the gateway SQLite DB), and session ephemera (settings.pending.*, paste-cache/, shell-snapshots/, session-env/, .DS_Store). This refinement makes LISA's procedural memory (the feedback substrate that every session loads) a first-class pipeline citizen while keeping PII and high-velocity noise out of git history permanently.

Options considered:

  1. (a) memory_gateway/ only — narrow

    • Pros: smallest surface, minimal Obsidian disruption, zero secrets risk from vault content.
    • Cons: leaves .claude/ agent prompts, skills, plans outside enforcement; leaves governance docs that custom rules must cite outside the repo (stale-citation failure class); pipeline protects runtime but not the orchestration that drives the runtime.
  2. (b) memory_gateway/ + .claude/ — medium

    • Pros: covers runtime + orchestration (agents, skills, plans); a prompt regression is caught by the pipeline.
    • Cons: governance docs still outside the repo, so custom rules cannot self-verify citations; CLAUDE.md is not in the repo so CodeRabbit still has to reach into a path outside its checkout (brittle).
  3. (c) memory_gateway/ + .claude/ (incl. agents, agent-memory, Tier 1 auto-memory) + governance subset — chosen

    • Pros: runtime + orchestration + the rules that govern them all live in one place; custom rule citations are self-verifiable; CodeRabbit and Semgrep both read CLAUDE.md from the repo root with no out-of-tree paths; agent prompt evolution and gateway code evolution share a commit history, which means an agent prompt change that depends on a gateway contract change lands atomically or not at all; agent domain memory and LISA's procedural auto-memory (feedback, references, projects) are protected by the pipeline rather than floating outside enforcement — a feedback memory regression is now a reviewable commit, not a silent drift.
    • Cons: moves governance docs + procedural memory into git, which is a workflow change for Tobi-san (though the initial implementation is one-way copy, so the vault browsing experience is preserved); larger secrets-scrub surface (settings.json scrub must handle more fields; user_profile.md is explicitly excluded to keep PII out of git history permanently); memory files carry higher change-rate than governance docs, which means more commit traffic on this subtree — acceptable because Tier 2/3 high-velocity exports (CS.*.md, PER.*.md) are gitignored and only Tier 1 procedural memory is tracked.
  4. (d) full vault minus personal — broad

    • Pros: complete closure; every file touching LISA is tracked.
    • Cons: pulls in mission-specific Forge files (CS.AK.ClientA., CS.YB., PER.LV.*) that are not load-bearing for the pipeline; ENT files for humans contain operator personal data (secrets risk); Intel/Meeting/Log temporal files are scan noise; the vault-vs-code separation Tobi-san maintains collapses entirely; test-generation dispatch count explodes because thousands of markdown files inflate the module surface.

Rationale:

Option (c) is the minimum closure that makes the executable governance principle honest. The test for any governance rule is "can the pipeline verify this rule still matches the doc that spawned it". Options (a) and (b) fail this test because the doc is outside the repo. Option (d) passes but at a cost — scan noise, secrets risk, and workflow disruption — that is not justified by the pipeline's actual enforcement needs.

The chosen boundary is exactly the set of files that participate in enforcement or are cited by enforcement. Everything else stays in the vault, where Obsidian is the primary interface.

Consequences:

  • Positive: executable governance is self-verifiable; a Semgrep custom rule citing a CLAUDE.md section can be checked at CI time against the repo-internal copy of that section; agent prompts evolve under pipeline discipline; the Lisa-OS repo is a single coherent artefact rather than a code-only slice.
  • Negative: a one-way vault→repo copy is a stopgap that needs a durable long-term model (OQ-1); governance-doc drift between vault and repo is a new failure class (R14); Sprint 0 needs a secrets-scrub script for .claude/settings.json.
  • Enabling: post-Sprint 1, the authoritative copy can flip — the repo becomes canonical and the vault becomes a symlink-mirror — which aligns the governance workflow with the pipeline it enforces.
  • Constraining: the initial sync mechanism (one-way copy) means the repo can drift from the vault until Sprint 1 retro resolves OQ-1. R14 mitigation (Sprint 1 pre-push diff check) is the interim guard.

Specification impact:

  • §05: Lisa-OS layout spec encodes the ADR.
  • §10: .claude/settings.json secrets-scrub is required by the ADR.
  • §14: R13-R15 introduced by the ADR.
  • §16: OQ-1 (authoritative copy direction) flows from the ADR.

ADR-2: Serial migration, Lisa-OS first (D5 ratified)

Status: Accepted (LISA directive, this TechSpec ratifies)

Decision source: Dispatch 48 seed — LISA D5 "Lisa-OS migrates first".

Context: Sprint queue rank in the plan's baseline-analysis heuristic (complexity_score = untested_file_count × (1 + semgrep_finding_count / 100) × language_complexity_multiplier) would rank Lisa-OS by its measured complexity, which is not necessarily smallest. Tobi-san and LISA decided Lisa-OS must be Sprint 1 regardless of heuristic rank.

Decision: Lisa-OS occupies Sprint 1 (pilot). Remaining 4 application repos occupy Sprints 2-5 in complexity_score order.

Rationale: Executable governance ratification. The pipeline's custom rules are derived from Lisa-OS's own CLAUDE.md. Running Lisa-OS through its own pipeline first is the only honest test of whether the rules survive contact with the code that spawned them. If they don't, the rules must be rewritten before any application repo is touched (A9).

Consequences: Sprint 1 dispatch count and API cost may be higher than a "smallest-first" ranking would have predicted (Lisa-OS has substantial TypeScript surface in memory_gateway/). The Sprint 1 retrospective revises the API cost projection using Lisa-OS data as the anchor, which is a stronger calibration than if Sprint 1 had been the smallest application repo.

Specification impact: §05 Migration order column; §08 Sprint 1 target.


ADR-3: LISA as sole CODEOWNER via 2FA identity (D4 ratified)

Status: Accepted (D4, this TechSpec ratifies)

Decision source: Plan D4, seeded into Dispatch 48.

Context: CODEOWNERS normally requires a reviewer who is not the PR author. Tobi-san is a solo human operator — without a second identity, CODEOWNERS is impossible. GitHub's mandatory 2FA rollout was initially read as a blocker for non-human identities, but Tobi-san successfully set up a 2FA-enabled account for LISA (the OS core repo), which unblocks CODEOWNERS entirely.

Decision: * @the OS core repo in the CODEOWNERS file at the root of all 5 repos. LISA is sole human reviewer. The CodeRabbit GitHub App remains active as defence in depth, not replaced.

Rationale: LISA-as-reviewer is strictly better than bot-only gating. She runs the full Gate 2 toolchain under her own identity, posts review comments as LISA, and carries review responsibility as a first-class operator. The CodeRabbit App continues to post automated review; the two layers are complementary.

Consequences: Requires ongoing LISA review workflow — a manual dispatch during cold-migration, webhook or scheduled sweep post-go-live (OQ-2). Adds a PGP signing key obligation for the LISA account. Expands the backup-code operational surface (2FA recovery).

Specification impact: §6.4 CODEOWNERS section; §10.3 2FA section; Sprint 0 §6.8 in the plan.


ADR-4: Q4 layered enforcement (CI authoritative + CodeRabbit advisory)

Status: Accepted (Q4 resolution, this TechSpec ratifies)

Decision source: Plan Q4 resolution 2026-04-10.

Context: "No new code without tests" can be enforced via (a) pre-commit hook, (b) CI step, (c) .coderabbit.yaml path instruction.

Decision: Layered — (b) authoritative CI action at Gate 3 + (c) advisory CodeRabbit path instruction. Explicitly NOT (a): pre-commit hooks are trivially bypassable with --no-verify.

Rationale: Authoritative enforcement belongs on the server where it cannot be bypassed. Advisory coaching belongs where it can offer quality nuance (CodeRabbit's review voice). The two layers are complementary: CI blocks untested new files, CodeRabbit coaches on test quality.

Consequences: requires authoring a custom GitHub Action (.github/actions/check-new-code-has-tests/) and a committed .clean-code-exceptions.yaml schema. Adds a post-go-live weekly audit obligation on fresh exceptions.

Specification impact: §6.5; §7.2; Sprint 0 §6.9; Sprint Final §8.3-§8.4 in the plan.


ADR-5: Claude-native test generation replaces Qodo Command CLI

Status: ACCEPTED (v1.2.0, 2026-04-12)

Decision source: Dispatch 88 evaluation (yoshimitsu, 2026-04-12) + Tobi-san approval. Triggered by Qodo Command CLI discontinuation April 2026.

Context:

The Clean Code Pipeline v1.0.0-v1.1.0 specified Qodo Command CLI (@qodo/command) as the test-generation and pre-push coverage enforcement tool at Gate 2a. In April 2026, Qodo discontinued the CLI — the test-coverage command was removed and test-generation capability migrated to the IDE-only Qodo Gen extension, which is not CLI-invocable and therefore unusable by CyberShinobi sub-agents.

Dispatch 88 (yoshimitsu) evaluated 14 alternative tools against four critical filters: TypeScript/vitest support, CLI or API invocability by sub-agents, LCOV coverage output, and quality of generated tests. Two viable options survived.

Options considered:

  1. (a) Claude-native test generation — genji sub-agent dispatches guided by analyze-coverage-mcp for gap identification + Stryker mutation testing (@stryker-mutator/core + @stryker-mutator/vitest-runner) for quality validation.

    • Pros: zero vendor dependency (no sunset risk); full codebase convention control (no-mock-db directive, Raw Twin awareness, test.skipIf pattern all native to genji's context); lower cost ($20-40 one-shot per repo vs $30-50/month recurring); Stryker mutation testing adds a quality validation layer not present in the original Qodo-based design.
    • Cons: higher coordination overhead (8-12 genji dispatches in 3-4 batches vs 1 Qodo command); LLM-generated tests may miss edge cases a dedicated tool would catch; requires Stryker as a new dependency (though free and open-source).
  2. (b) Tusk ($50/dev/month, CLI + API, 90% bug detection benchmark).

    • Pros: only commercial option with CLI/API usable by sub-agents; high reported bug-detection rate.
    • Cons: vendor dependency (same sunset risk class as Qodo); $50/dev/month recurring; less convention control than Claude-native.

Decision: (a) Claude-native. Genji sub-agent dispatches for test authoring, analyze-coverage-mcp for coverage gap targeting, Stryker mutation testing (>= 70% mutation score) for quality validation. Tusk retained as commercial fallback — re-evaluated at Sprint 1 retrospective if Claude-native quality proves insufficient.

Rationale: The pipeline's test-generation capability should not depend on a vendor whose product lifecycle is outside Cipher Shinobi's control. Qodo Command CLI's discontinuation proved this risk was not theoretical. The Claude-native approach eliminates vendor dependency entirely — the test-generation capability runs on the same infrastructure (Claude API + CyberShinobi agents) that powers the rest of the pipeline. Stryker mutation testing compensates for the inherent quality-variance risk of LLM-generated tests by providing a deterministic quality gate that Qodo never offered.

Consequences:

  • Positive: zero vendor dependency for test generation; Stryker mutation score adds quality assurance not present in original design; full convention control for generated tests; lower cost.
  • Negative: higher coordination overhead (8-12 dispatches vs 1 command per repo); Phase B execution time increases; new dependency on Stryker (free, open-source, well-maintained).
  • Enabling: the batch dispatch pattern (security-tagged first, then high-import, then remaining) creates a reusable template for any future repo-wide agent operation.
  • Constraining: Sprint 1 retrospective becomes a mandatory quality gate for the Claude-native approach — if mutation scores consistently fall below 70% or Tobi-san spot-check pass rate drops below 80%, the pivot to Tusk is pre-approved.

Specification impact: FR-1.2, §6.2, §7 (tool stack), §8.2 (Phase B), §12.5 (NFR-C1/C2), §13.1 (AC-P8), §14 (R1), §15 (dependencies + A1).


ADR-6: Hybrid Linear mapping — Fuda = Linear Issues, dispatches as comments (D6)

Status: LOCKED (2026-04-10)

Decision source: Tobi-san approval, Dispatch 60 planning session (2026-04-10).

Context:

Gate 0 (Fuda scoping) introduces a structured change contract that must be tracked externally. Three tracking models were considered: (a) vault-only markdown Fuda files with no external tracker, (b) Linear issues as the sole tracking surface, (c) hybrid — Linear issues for Fuda + automatic dispatch comments via gateway webhook.

Options considered:

  1. (a) Vault-only markdown — Fuda as vault Forge files (CS.AK.*.Matter.Fuda.*).

    • Pros: zero external dependencies; Obsidian browsing; FCF-compliant.
    • Cons: no automated status tracking; no dispatch linkage; manual overhead to update state; invisible to anyone without vault access; duplicates the information that dispatches already carry.
  2. (b) Linear issues only — Fuda created as Linear issues; dispatch state tracked separately in the gateway.

    • Pros: external visibility; issue boards for Tobi-san; status tracking built in.
    • Cons: dispatch activity is split across two systems (Linear for Fuda, gateway dashboard for dispatches); no automated linkage between the two.
  3. (c) Hybrid — Linear issues + automatic dispatch comments — chosen.

    • Pros: single source of truth for the Fuda lifecycle (Linear issue); dispatch activity automatically posted as comments (zero manual overhead); fuda_id on report_dispatch threads the link structurally; gateway dashboard retains full dispatch telemetry; Linear comments provide a human-readable timeline of every dispatch that touched the Fuda.
    • Cons: adds Linear API dependency to the dispatch path (mitigated: best-effort posting, dispatch proceeds on comment failure); requires gateway schema extension (fuda_id field).

Decision: (c) Hybrid. Fuda = Linear Issues. Dispatches logged as automatic comments via gateway webhook. fuda_id added as required field on report_dispatch schema. Gateway auto-posts a comment to the Linear issue on every report_dispatch and report_complete call. Zero manual tracking overhead.

Rationale: The hybrid model eliminates the tracking split between Linear and the gateway. A single Linear issue thread contains the Fuda scope, every dispatch that executed against it, and the final outcome — readable by Tobi-san without touching the gateway dashboard. The structural link (fuda_id on dispatches) makes the relationship queryable, not just narrative.

Consequences: gateway schema gains fuda_id (required on report_dispatch); Linear MCP tools (save_issue, save_comment) become dispatch-path dependencies (best-effort); every dispatch comment is machine-parseable for future dashboard integration.

Specification impact: §6.0; §9.6; §7.1; §18.


ADR-7: YAML+MD dependency maps per repo (D7)

Status: LOCKED (2026-04-10)

Decision source: Tobi-san approval, Dispatch 60 planning session (2026-04-10).

Context:

Gate 0 Fuda scoping requires dependency impact analysis. The analysis must be based on measured data, not guesses. Three dependency-tracking approaches were considered: (a) manual annotation in each Fuda, (b) runtime dependency scanning on every Fuda creation, (c) pre-computed dependency maps maintained per repo.

Options considered:

  1. (a) Manual annotation — yoshimitsu reads the codebase and manually lists dependencies.

    • Pros: zero tooling overhead; works for any language.
    • Cons: error-prone on large codebases; slow; depends on yoshimitsu's ability to trace imports correctly; no machine-readable output for validation.
  2. (b) On-demand scanning — run madge or equivalent at Fuda creation time.

    • Pros: always fresh; no stale data.
    • Cons: adds latency to Gate 0 (scanning a full repo takes seconds to minutes); requires tooling installed on every workstation; no cached baseline for diff analysis.
  3. (c) Pre-computed depmap.yaml + derived DEPMAP.md — chosen.

    • Pros: zero latency at Fuda creation (file is pre-computed); machine-readable YAML for programmatic consumption; human-readable markdown for Obsidian browsing; auto-refreshed on merge to main via CI step; baseline always current with main; diff between depmap.yaml versions shows dependency evolution over time.
    • Cons: stale between merges (mitigated: CI refresh on every merge; staleness window is one PR cycle); requires per-repo CI step; YAML schema must be maintained.

Decision: (c) Pre-computed maps. depmap.yaml per repo root (machine-readable source of truth, auto-refreshed on merge to main via CI step). DEPMAP.md generated from YAML for Obsidian/human consumption. YAML authoritative; markdown derived. Both committed.

Rationale: Gate 0 latency is critical — adding a full-repo scan at Fuda creation time would make the scoping step feel slower than the code it gates. Pre-computed maps trade freshness (stale by one PR cycle) for instant availability. The CI refresh on merge ensures the map is always current with main, which is the branch that matters for scoping analysis.

Consequences: every repo gains two files at root (depmap.yaml, DEPMAP.md); CI workflows gain a refresh-depmap job; Sprint 0 must generate baseline maps for all 5 repos; the fuda skill reads depmap.yaml programmatically.

Specification impact: §5.4; §6.0; §7.1; §8.1 item 10.


ADR-8: Yoshimitsu/Raiden/Gray-Fox Gate 0 enforcement model (D8)

Status: LOCKED (2026-04-10)

Decision source: Tobi-san approval, Dispatch 60 planning session (2026-04-10).

Context:

Gate 0 Fuda review requires analytical enforcement — someone must verify that the scoping contract is complete, sound, and security-aware. Three models were considered: (a) LISA reviews all Fuda directly, (b) a single agent reviews, (c) multi-agent parallel review with role separation.

Options considered:

  1. (a) LISA reviews directly — LISA drafts and reviews Fuda herself.

    • Pros: simplest; no dispatch overhead.
    • Cons: LISA is a coordinator, not a domain specialist; review quality depends on LISA's analytical depth in security and planning; blocks LISA's main thread during review.
  2. (b) Single agent (yoshimitsu) — yoshimitsu drafts and self-reviews.

    • Pros: low dispatch overhead; yoshimitsu has planning domain expertise.
    • Cons: no separation of concerns; security review absent; self-review is inherently weaker than peer review.
  3. (c) Multi-agent parallel review — chosen.

    • Pros: separation of concerns (drafting vs completeness review vs security review); parallel execution (raiden + gray-fox are orthogonal); security expertise applied only where relevant (gray-fox triggers on tags: [security]); fast-track for trivial changes avoids ceremony overhead.
    • Cons: higher dispatch overhead for complex Fuda (3 dispatches); requires coordination logic in the fuda skill.

Decision: (c) Multi-agent parallel review. Yoshimitsu drafts Fuda (reads depmap.yaml, fills all required sections). Raiden reviews every Fuda for completeness + soundness. Gray-fox reviews only Fuda touching security-tagged modules. LISA coordinates, never drafts/reviews. Parallel review (raiden + gray-fox orthogonal). Fast-track threshold: <=2 files in single module, no cross-module deps → yoshimitsu drafts + raiden reviews alone, gray-fox skips. Full ceremony for everything else. Max 2 revision rounds, then escalate to Tobi-san. Risk threshold (mandatory Tobi-san sign-off): >5 files in depmap, security domain, cross-repo impact, governance doc changes.

Rationale: The pipeline already enforces separation of concerns at Gates 1-3 (Gray Fox for rules, Raiden for verification, Genji for implementation). Extending this pattern to Gate 0 is consistent. The fast-track threshold prevents Gate 0 from being heavier than the code change it scopes — a 2-file single-module fix should not require a 3-agent review ceremony. The revision cap prevents infinite loops on genuinely ambiguous scope questions.

Consequences: fuda skill must implement the dispatch orchestration (yoshimitsu → parallel raiden + gray-fox → merge → approve/revise); depmap.yaml tags: [security] field drives gray-fox trigger logic; fast-track detection must read depmap programmatically.

Specification impact: §6.0.2, §6.0.3, §6.0.4; §9.2 (agent roster gains Gate 0 roles).


ADR-9: fuda skill as pipeline entry point (D9)

Status: LOCKED (2026-04-10)

Decision source: Tobi-san approval, Dispatch 60 planning session (2026-04-10).

Context:

Gate 0 enforcement needs an orchestration surface — something that reads the depmap, dispatches agents, posts to Linear, manages the review cycle, and returns a fuda_id. Three approaches were considered: (a) manual LISA orchestration per dispatch, (b) a gateway endpoint that handles the full flow, (c) a Claude Code skill.

Options considered:

  1. (a) Manual LISA orchestration — LISA runs the Gate 0 sequence manually in the main thread.

    • Pros: no new code; uses existing dispatch primitives.
    • Cons: blocks LISA's main thread; error-prone on complex flows (parallel review, revision loops); no auto-trigger capability; every code dispatch requires manual ceremony.
  2. (b) Gateway endpoint — a new /api/fuda/create endpoint that handles the full flow server-side.

    • Pros: centralised; no client-side orchestration.
    • Cons: the gateway is a data persistence layer, not an orchestration engine; adding dispatch logic to the gateway violates its current architectural boundary; difficult to test end-to-end.
  3. (c) Claude Code skill — chosen.

    • Pros: runs in the Claude Code context where agents are dispatched; natural fit for multi-agent orchestration; auto-trigger hooks available (post_write_code and equivalent); manual /fuda invocation for explicit use; skill definition is a vault artefact (governed, versioned); no gateway architectural violation.
    • Cons: skill must be installed on every workstation (single workstation currently — non-issue); skill logic is in markdown (limited expressiveness — mitigated by dispatching to agents for the heavy lifting).

Decision: (c) Claude Code skill. Skill name fuda, manual /fuda. Auto-triggers on: code write/modify/delete in any tracked repo, new repo creation, pipeline artefact modification, genji dispatch with code deliverables. Does NOT trigger on: pure research, vault-only markdown, code reading without modification, pipeline already active. Skill orchestrates: identify repos → read depmap.yaml → dispatch yoshimitsu (draft) → post to Linear → dispatch raiden + gray-fox parallel → fast-track check → approve/revise → risk threshold → implementation dispatch with fuda_id.

Rationale: The skill pattern is the established orchestration surface in the Cipher Shinobi stack. Skills are governed artefacts (SkillApprovalGate, SkillDraftGuide), testable (skill-grader), and composable. The fuda skill is the pipeline's front door — it is the first thing that runs when code work begins and the last thing that must succeed before report_dispatch accepts the fuda_id.

Consequences: Sprint 0 must author the skill (§8.1 item 11); the skill requires Linear MCP tools (save_issue, save_comment, get_issue); auto-trigger logic must distinguish code-producing from non-code work; the skill becomes the canonical Gate 0 entry point documented in the operator runbook.

Specification impact: §6.0; §7.1; §8.1 item 11; §18.


18. PROPAGATION TARGETS

Per the LisaOSMap §9b Change Impact Matrix, this TechSpec introduces a new governance doc and locks decisions that propagate to the following downstream files. Sub-agent propagation signal [PROPAGATION_REQUIRED] accompanies the dispatch completion.

#TargetChange typeWhy
1CLAUDE.md (vault root)Add row to On-Demand Context table pointing at CS.AK.LISA.TechSpec.CleanCodePipeline.md for "Clean Code Pipeline work or gate authoring"New canonical governance doc
2CS.AK.LISA.Docu.LisaOSMap.md §9bAdd Change Impact Matrix row for "Clean Code Pipeline TechSpec → downstream targets"Map itself is binding for propagation
3CS.AK.LISA.Docu.LisaOSMap.md §2 (governance docs table)Add TechSpec rowRegistry-level visibility
4~/…Add pointer to TechSpec + note "Lisa-OS is Sprint 1 per D5 + ADR-1"; update repo count from 4 to 5; add R13-R15 carry-forward notePlan must reflect ADR-1 + D5
5~/.claude/projects/.../memory/reference_github_org.mdAdd cipher-shinobi/lisa-os to repo list with ADR-1 scope note; update repo count from 4 to 5Reference memory drives GitHub context
6CS.AK.LISA.Docu.CodeDisciplineProtocol.mdCross-link from the Four-Gate Unified Review Protocol section to this TechSpec's §06Binding ref for Genji/Raiden; the pipeline is the operational instantiation of the protocol
7CS.AK.LISA.Docu.RawTwinDiscipline.mdCross-link to the custom Semgrep rule cipher-shinobi.raw-twin-discipline that will enforce it mechanicallyRaw Twin Discipline is the first rule Gray Fox authors
8CS.AK.LISA.Data.ArtefactMap.mdAdd entries for upcoming Sprint 0 artefacts: artefacts/code/hooks/pre-push-clean-code-pipeline.sh, artefacts/code/semgrep/cipher-shinobi/, artefacts/code/github-actions/check-new-code-has-tests/ArtefactMap maintenance protocol
9Future: CS.AK.LISA.Docu.CleanCodePipelineRunbook.mdAuthored at Sprint Final §8.6; linked from each repo's CLAUDE.mdOperator runbook — out of scope for this TechSpec, flagged for Sprint Final
10Future: each repo's CLAUDE.md (all 5)Link to runbook once authoredRunbook linkage
11(v1.1.0) ~/.claude/skills/fuda/fuda skill creationAuthor per D9 specification (§8.1 item 11); register in Skills RegistryGate 0 pipeline entry point; Sprint 0 exit criterion
12(v1.1.0) artefacts/code/lisa/memory_gateway/server/dispatch/types.tsReportDispatchInputSchemaAdd fuda_id: z.string() as required fieldGateway structural enforcement for Gate 0 (D6)
13(v1.1.0) CLAUDE.md (vault root) — Dispatch Execution ChecklistInsert Fuda steps before existing step 1: read depmap, draft Fuda via /fuda, post to Linear, risk threshold check, obtain fuda_id before report_dispatchGate 0 integrated into the canonical dispatch flow
14(v1.1.0) Gateway dispatch handlers (server/dispatch/index.ts)Add Linear comment posting on report_dispatch and report_complete via linear-server MCP toolsD6 hybrid Linear mapping — automatic dispatch comments
15(v1.1.0) Per-repo CI workflow — refresh-depmap jobAdd depmap auto-refresh CI step template to each repo's workflow; generate scripts/generate-depmap.js + scripts/generate-depmap-md.jsD7 dependency map auto-refresh on merge to main
16(v1.1.0) depmap.yaml + DEPMAP.md — baseline generation for all 5 reposSprint 0 task; generate initial dependency maps using language-appropriate toolingD7 Sprint 0 exit criterion (§8.1 item 10)

Propagation dispatch recommendation: because items 1-8 span multiple domains (CLAUDE.md, governance docs, memory, plan, ArtefactMap), LISA should dispatch Smoke to execute propagation as a single follow-up sprint. Items 9-10 belong to Sprint Final and are tracked there. Items 11-16 (v1.1.0 additions) span engineering + vault + governance: items 11-12 and 14-15 are Genji implementation work; item 13 is a CLAUDE.md governance edit (LISA or Smoke); item 16 is Sprint 0 foundation work (yoshimitsu coordination).


End of specification.

On this page

Contents01. SPECIFICATION OVERVIEWSystem: Clean Code Pipeline (CCP)Specification ScopeVersion HistoryReading Guide02. SYSTEM CONTEXTSystem BoundaryActorsExternal System DependenciesOperational Constraints03. FUNCTIONAL REQUIREMENTSFeature Catalogue3.1 Pipeline Enforcement (Deliverable 1: Three-gate enforcement surface)FR-1.1: Write-time SAST + SCA + secrets scanFR-1.2: Pre-push test coverage verification (v1.2.0 — Claude-native)FR-1.3: Pre-push logic + style + architecture review (CodeRabbit CLI)FR-1.4: Server-side enforcement (GitHub Actions + CodeRabbit App)3.2 Cold-Migration (Deliverable 2: Clean baseline per repo)FR-2.1: Sprint 0 baseline analysisFR-2.2: Five-phase per-repo migration templateFR-2.3: Pilot-sprint retrospective (Q3 mandatory gate)FR-2.4: Broken-baseline decision rule (Q5)3.3 Governance Enforcement (Deliverable 3: Executable governance)FR-3.1: Semgrep custom rules from CLAUDE.mdFR-3.2: .clean-code-exceptions.yaml schema3.4 Fuda Scoping and Dependency Tracking (Deliverable 4: Upfront change discipline)FR-4.1: Gate 0 Fuda requirement on code dispatchesFR-4.2: Dependency map maintenance per repoFR-4.3: Gate 0 scoping review before implementationFR-4.4: Linear issue lifecycle trackingTraceability Matrix04. SYSTEM ARCHITECTURE4.1 Architecture Overview4.2 Four-Gate Pipeline Diagram4.3 Gate Execution Contract4.4 Technology Stack05. REPO TOPOLOGY5.1 Five-Repo Layout5.2 Lisa-OS Repo Boundary — ADR-1 Resolution5.3 Repo Relationships5.4 Dependency Maps (v1.1.0)06. GATE DESIGN6.0 Gate 0 — Fuda Scoping (v1.1.0)6.0.1 Required Fuda Sections6.0.2 Enforcement Model (Three Layers)6.0.3 Fast-track Threshold6.0.4 Revision Cap6.0.5 Fuda Lifecycle6.0.6 V-3 Ladder Randomised-Order Pre-Lock Discipline (v1.3.0)6.0.7 STOP-after-D Best-Available Ship Pattern (v1.3.0)6.1 Gate 1 — Write-time (Semgrep MCP)6.2 Gate 2a — Pre-push coverage verification (v1.2.0 — Claude-native)6.3 Gate 2b — Pre-push (CodeRabbit CLI)6.4 Gate 3 — Server-side (GitHub Actions + CodeRabbit App + CODEOWNERS)6.5 No-new-code-without-tests Action (Q4 layered enforcement)07. TOOL STACK7.1 Install Matrix7.2 .clean-code-exceptions.yaml schema7.3 Rejected / out-of-stack tools08. COLD-MIGRATION METHODOLOGY8.1 Sprint 0 — Foundation8.2 Sprints 1-N — Per-repo Five-Phase Template8.3 Sprint 1 Pilot Gate (Q3 MANDATORY)8.4 Broken Baseline Decision Rule (Q5)8.5 Sprint Final — Go-live09. GOVERNANCE INTEGRATION9.1 CLAUDE.md Integration9.2 Agent Roster Integration9.3 Psychic Cache Integration9.4 Dispatch Lifecycle Integration9.5 LisaOSMap Change Impact Matrix9.6 Linear Integration Protocol (v1.1.0)9.6.1 Fuda = Linear Issue9.6.2 Dispatches = Automatic Linear Comments9.6.3 Gateway Schema Extension9.6.4 Issue State Lifecycle10. SECURITY POSTURE10.1 Secrets Handling10.2 .gitignore Scope (Lisa-OS)10.3 Two-Factor Authentication10.4 PGP-signed Commits10.5 No Admin Bypass10.6 Threat Surface — Lisa-OS Specifics11. OBSERVABILITY11.1 GitHub Actions Dashboards11.2 Coverage Trends11.3 Security Alerts11.4 Pipeline Health Signals12. NON-FUNCTIONAL REQUIREMENTS12.1 Performance12.2 Availability12.3 Security12.4 Maintainability12.5 Cost13. ACCEPTANCE CRITERIA13.1 Pipeline-Level Acceptance (Sprint Final exit)13.2 Per-Repo Sprint Acceptance (Sprints 1-N)13.3 Pilot Gate Acceptance (Sprint 1 only)14. RISK REGISTER15. DEPENDENCIES AND ASSUMPTIONS15.1 External Dependencies15.2 Internal Dependencies15.3 Assumptions16. OPEN QUESTIONSOQ-1: Authoritative copy direction for governance docs post-Sprint 1 — RESOLVED 2026-04-10OQ-2: Webhook vs manual vs scheduled LISA review trigger — RESOLVED 2026-04-10OQ-3: Rule-version drift detection mechanism (R15 mitigation) — RESOLVED 2026-04-1017. ARCHITECTURAL DECISION RECORDSADR-1: Lisa-OS repo scope = memory_gateway/ + .claude/ + governance subsetADR-2: Serial migration, Lisa-OS first (D5 ratified)ADR-3: LISA as sole CODEOWNER via 2FA identity (D4 ratified)ADR-4: Q4 layered enforcement (CI authoritative + CodeRabbit advisory)ADR-5: Claude-native test generation replaces Qodo Command CLIADR-6: Hybrid Linear mapping — Fuda = Linear Issues, dispatches as comments (D6)ADR-7: YAML+MD dependency maps per repo (D7)ADR-8: Yoshimitsu/Raiden/Gray-Fox Gate 0 enforcement model (D8)ADR-9: fuda skill as pipeline entry point (D9)18. PROPAGATION TARGETS