Gateway Module — `vault-index/`
1. Purpose
The vault-document indexer (D361 / Fuda AK-287). It walks the Sage Library vault,
FCF-classifies each markdown file, heading-chunks it (1,500-token cap), embeds and
FTS-indexes every chunk, and exposes the result as a SearchAdapter so the
context assembly fanout can retrieve vault knowledge as a first-class source
(vault_index store). It runs a differential scan (mtime-gated, hash-checked,
deletion-reconciled, path-traversal-guarded) and maintains a per-FCF-type
effectiveness table fed by the RAG feedback loop — measuring which document types
(ENT, Docu, Intel, …) actually get used after being assembled.
2. File Inventory
| File | Lines | Responsibility |
|---|---|---|
repository.ts | 332 | createVaultIndexRepository: CRUD on vault_documents/vault_chunks/vault_vec/vault_fts, createSearchAdapter (vector+FTS+RRF for context fanout), runTypeEffectivenessEval + getTypeEffectiveness |
scanner.ts | 250 | scanVault differential walk: exclude-pattern filter, path-traversal guard, mtime gating, content-hash change detection, chunk+embed, deletion reconciliation; in-memory scanInProgress mutex + isScanInProgress |
chunker.ts | 168 | chunkMarkdown: strip frontmatter, heading-stack segmentation, token-cap splitting (section → paragraph → word boundaries) |
fcf-filter.ts | 140 | extractFcfType (filename segments), extractFcfTypeFromFrontmatter (YAML fallback), extractMissionNamespace, classifyFile; whitelist of 8 indexable types |
types.ts | 87 | FCF_TYPE_WHITELIST, VaultDocumentSchema, VaultChunkSchema, ScanConfigSchema (defaults + exclude patterns), ScanResultSchema |
index.ts | 85 | createVaultIndexLayer factory + router (/scan, /status, /type-effectiveness, /type-effectiveness/eval) |
| Total | 1,062 |
3. Public API Surface
REST Endpoints
Mounted at /api/vault-index. Auth per AK-415: GETs → Bearer (write OR
read-only); writes → Bearer (write only). /api/vault-index was explicitly
dropped from the former GET-exemption set (D913 Fix 2) — now Bearer-gated.
| Method | Path | Auth | Body / Query | Response | Side Effects |
|---|---|---|---|---|---|
| POST | /scan | Bearer | — (vault root from VAULT_ROOT/resolve) | {data: ScanResult} / 409 (scan in progress) / 500 | Walks vault; upserts docs/chunks; embeds → vault_vec; FTS via trigger; deletes stale docs |
| GET | /status | Bearer | — | {scanning, documents, chunks} | none |
| GET | /type-effectiveness | Bearer | — | {data: TypeEffectiveness[]} | none |
| POST | /type-effectiveness/eval | Bearer | — | {data: TypeEffectiveness[]} | Scans 7-day context_assembly_log window; upserts vault_type_effectiveness |
MCP Tools
None. This module has no direct MCP tool — its retrieval reaches agents
indirectly via assemble_context (the vault_index source store in the
context fanout). Scans are triggered by launchd/cron or manual REST call, not MCP.
4. Internal API
createVaultIndexLayer(db, embeddingProvider): {repo, router, scan(vaultRoot)}.VaultIndexRepository— full CRUD +createSearchAdapter()(consumed bycontext/index.tsfanout asvaultIndexSearch) + effectiveness eval. document + chunk CRUD (path lookup, list, upsert, chunk insertion incl. vector, deletion, and count helpers).fcf-filterexports (extractFcfType,extractMissionNamespace,classifyFile,extractFcfTypeFromFrontmatter) — reusable FCF parsing, currently scanner-internal.chunkMarkdown+Chunk— reusable heading chunker.
5. Background Services
- Differential scan (
scanVault) — not self-scheduled; invoked byPOST /api/vault-index/scan(launchd/cron cadence) or the modulescan()method. Guarded by an in-memoryscanInProgressmutex (overlapping triggers → 409 / rejected result). Per-file: mtime-gate skip, size-cap skip (512KB), FCF-classify skip, chunk+embed, deletion reconciliation at end. - Type-effectiveness eval —
POST /type-effectiveness/eval; scans a 168-hourcontext_assembly_logwindow, tallies assembled-vs-used per FCF type (parsed from the[FcfType]content prefix), upsertsvault_type_effectiveness. Feeds source-type signal-to-noise back to the operator.
6. Data Contracts
FCF_TYPE_WHITELIST=['ENT','Docu','Seed','MissionBrief','Intel','TechSpec','IDEA','IN'](8 indexable types; other types are skipped).ScanConfigSchema={vault_root, max_file_size_bytes=512000, max_chunk_tokens=1500, exclude_patterns=[artefacts/**, .obsidian/**, .claude/**, .trash/**, templates/**, temporal/**, media/**, node_modules/**]}.ScanResultSchema={total_files_scanned, files_indexed, files_skipped, files_deleted, chunks_created, duration_ms, errors[]}.VaultDocumentSchema/VaultChunkSchema— row shapes.TypeEffectiveness={fcf_type, assembled_count, used_count, effectiveness, last_eval_at}.
Owned tables (declared in memory/db.ts): vault_documents, vault_chunks
(+index, FK cascade), vault_fts (fts5), vault_vec (vec0), vault_type_effectiveness,
and sidecar triggers vault_chunks_ai/ad_sidecar (FTS trigger-maintained).
7. Dependencies
- Gateway modules consumed:
memory/types(EmbeddingProvider),context/types(SearchAdapter,SearchResult— the adapter output contract),shared/search-utils(estimateTokens,buildFts5Query,reciprocalRankFusion,normaliseRrfScores). Readscontext_assembly_log(context module's table) for effectiveness eval. - External libraries:
better-sqlite3,sqlite-vec,express,zod, Nodecrypto+fs/promises+path. - Environment variables:
VAULT_ROOT(scan root; falls back to.in config default and a 6-levels-up resolve in the/scanroute).
8. Test Coverage
| Layer | File | Tests |
|---|---|---|
| Repository + layer + adapter | test/vault-index.test.ts | 29 |
FCF parser (fcf-filter) | test/vault-parser.test.ts | 19 |
| Scanner (walk, mtime gate, traversal, deletion) | test/vault-scanner.test.ts | 24 |
Chunker (chunkMarkdown) | (covered within vault-index/parser suites) | — |
72 targeted tests across the module's three behavioural surfaces.
9. Known Limitations
- Scan is manual/cron-triggered, not event-driven — the index drifts stale
between scans; a freshly-written vault note is invisible to
assemble_contextuntil the next scan. mtime gating means the freshness ceiling = scan cadence. - In-memory mutex only —
scanInProgressis per-process; a multi-process deployment (not current) could run concurrent scans. - Glob matching is prefix-only —
matchesExcludePatternsupportsdir/**prefix + exact match, not full glob (no*.ext, no mid-path wildcards). - 8-type whitelist — Matter, Data, Code, Log, Meeting, CMD, TMPL, etc. are not indexed; vault knowledge in those types is invisible to RAG.
- Effectiveness eval parses a content prefix — relies on
[FcfType]literal at the start of the assembledvault_indexcontent; a formatting change in the adapter's prefix would silently break the eval tally.
10. Change History
| Date | Dispatch | Summary |
|---|---|---|
| 2026-07-04 | 2295 | Initial per-module spec filed (audit campaign Phase 4) |
| (historical) | D361 (AK-287) | Module created — vault scan/chunk/embed/index + SearchAdapter + path-traversal guard |
| (historical) | D913 Fix 2 | /api/vault-index moved off the former GET-exemption set (now Bearer-gated) |