LISAOS // DOCS
GATEWAY // VAULT INDEX

Gateway Module — `vault-index/`

1. Purpose

The vault-document indexer (D361 / Fuda AK-287). It walks the Sage Library vault, FCF-classifies each markdown file, heading-chunks it (1,500-token cap), embeds and FTS-indexes every chunk, and exposes the result as a SearchAdapter so the context assembly fanout can retrieve vault knowledge as a first-class source (vault_index store). It runs a differential scan (mtime-gated, hash-checked, deletion-reconciled, path-traversal-guarded) and maintains a per-FCF-type effectiveness table fed by the RAG feedback loop — measuring which document types (ENT, Docu, Intel, …) actually get used after being assembled.

2. File Inventory

FileLinesResponsibility
repository.ts332createVaultIndexRepository: CRUD on vault_documents/vault_chunks/vault_vec/vault_fts, createSearchAdapter (vector+FTS+RRF for context fanout), runTypeEffectivenessEval + getTypeEffectiveness
scanner.ts250scanVault differential walk: exclude-pattern filter, path-traversal guard, mtime gating, content-hash change detection, chunk+embed, deletion reconciliation; in-memory scanInProgress mutex + isScanInProgress
chunker.ts168chunkMarkdown: strip frontmatter, heading-stack segmentation, token-cap splitting (section → paragraph → word boundaries)
fcf-filter.ts140extractFcfType (filename segments), extractFcfTypeFromFrontmatter (YAML fallback), extractMissionNamespace, classifyFile; whitelist of 8 indexable types
types.ts87FCF_TYPE_WHITELIST, VaultDocumentSchema, VaultChunkSchema, ScanConfigSchema (defaults + exclude patterns), ScanResultSchema
index.ts85createVaultIndexLayer factory + router (/scan, /status, /type-effectiveness, /type-effectiveness/eval)
Total1,062

3. Public API Surface

REST Endpoints

Mounted at /api/vault-index. Auth per AK-415: GETs → Bearer (write OR read-only); writes → Bearer (write only). /api/vault-index was explicitly dropped from the former GET-exemption set (D913 Fix 2) — now Bearer-gated.

MethodPathAuthBody / QueryResponseSide Effects
POST/scanBearer— (vault root from VAULT_ROOT/resolve){data: ScanResult} / 409 (scan in progress) / 500Walks vault; upserts docs/chunks; embeds → vault_vec; FTS via trigger; deletes stale docs
GET/statusBearer{scanning, documents, chunks}none
GET/type-effectivenessBearer{data: TypeEffectiveness[]}none
POST/type-effectiveness/evalBearer{data: TypeEffectiveness[]}Scans 7-day context_assembly_log window; upserts vault_type_effectiveness

MCP Tools

None. This module has no direct MCP tool — its retrieval reaches agents indirectly via assemble_context (the vault_index source store in the context fanout). Scans are triggered by launchd/cron or manual REST call, not MCP.

4. Internal API

  • createVaultIndexLayer(db, embeddingProvider): {repo, router, scan(vaultRoot)}.
  • VaultIndexRepository — full CRUD + createSearchAdapter() (consumed by context/index.ts fanout as vaultIndexSearch) + effectiveness eval. document + chunk CRUD (path lookup, list, upsert, chunk insertion incl. vector, deletion, and count helpers).
  • fcf-filter exports (extractFcfType, extractMissionNamespace, classifyFile, extractFcfTypeFromFrontmatter) — reusable FCF parsing, currently scanner-internal.
  • chunkMarkdown + Chunk — reusable heading chunker.

5. Background Services

  • Differential scan (scanVault) — not self-scheduled; invoked by POST /api/vault-index/scan (launchd/cron cadence) or the module scan() method. Guarded by an in-memory scanInProgress mutex (overlapping triggers → 409 / rejected result). Per-file: mtime-gate skip, size-cap skip (512KB), FCF-classify skip, chunk+embed, deletion reconciliation at end.
  • Type-effectiveness evalPOST /type-effectiveness/eval; scans a 168-hour context_assembly_log window, tallies assembled-vs-used per FCF type (parsed from the [FcfType] content prefix), upserts vault_type_effectiveness. Feeds source-type signal-to-noise back to the operator.

6. Data Contracts

  • FCF_TYPE_WHITELIST = ['ENT','Docu','Seed','MissionBrief','Intel','TechSpec','IDEA','IN'] (8 indexable types; other types are skipped).
  • ScanConfigSchema = {vault_root, max_file_size_bytes=512000, max_chunk_tokens=1500, exclude_patterns=[artefacts/**, .obsidian/**, .claude/**, .trash/**, templates/**, temporal/**, media/**, node_modules/**]}.
  • ScanResultSchema = {total_files_scanned, files_indexed, files_skipped, files_deleted, chunks_created, duration_ms, errors[]}.
  • VaultDocumentSchema / VaultChunkSchema — row shapes.
  • TypeEffectiveness = {fcf_type, assembled_count, used_count, effectiveness, last_eval_at}.

Owned tables (declared in memory/db.ts): vault_documents, vault_chunks (+index, FK cascade), vault_fts (fts5), vault_vec (vec0), vault_type_effectiveness, and sidecar triggers vault_chunks_ai/ad_sidecar (FTS trigger-maintained).

7. Dependencies

  • Gateway modules consumed: memory/types (EmbeddingProvider), context/types (SearchAdapter, SearchResult — the adapter output contract), shared/search-utils (estimateTokens, buildFts5Query, reciprocalRankFusion, normaliseRrfScores). Reads context_assembly_log (context module's table) for effectiveness eval.
  • External libraries: better-sqlite3, sqlite-vec, express, zod, Node crypto + fs/promises + path.
  • Environment variables: VAULT_ROOT (scan root; falls back to . in config default and a 6-levels-up resolve in the /scan route).

8. Test Coverage

LayerFileTests
Repository + layer + adaptertest/vault-index.test.ts29
FCF parser (fcf-filter)test/vault-parser.test.ts19
Scanner (walk, mtime gate, traversal, deletion)test/vault-scanner.test.ts24
Chunker (chunkMarkdown)(covered within vault-index/parser suites)

72 targeted tests across the module's three behavioural surfaces.

9. Known Limitations

  • Scan is manual/cron-triggered, not event-driven — the index drifts stale between scans; a freshly-written vault note is invisible to assemble_context until the next scan. mtime gating means the freshness ceiling = scan cadence.
  • In-memory mutex onlyscanInProgress is per-process; a multi-process deployment (not current) could run concurrent scans.
  • Glob matching is prefix-onlymatchesExcludePattern supports dir/** prefix + exact match, not full glob (no *.ext, no mid-path wildcards).
  • 8-type whitelist — Matter, Data, Code, Log, Meeting, CMD, TMPL, etc. are not indexed; vault knowledge in those types is invisible to RAG.
  • Effectiveness eval parses a content prefix — relies on [FcfType] literal at the start of the assembled vault_index content; a formatting change in the adapter's prefix would silently break the eval tally.

10. Change History

DateDispatchSummary
2026-07-042295Initial per-module spec filed (audit campaign Phase 4)
(historical)D361 (AK-287)Module created — vault scan/chunk/embed/index + SearchAdapter + path-traversal guard
(historical)D913 Fix 2/api/vault-index moved off the former GET-exemption set (now Bearer-gated)

On this page