Profile an existing wiki with an agent#

The by-hand guide has you read inspector evidence and decide the schema. This guide hands that judgment to an agent: inspect supplies the measurements, the agent supplies the thresholds, collection-boundary decisions, and the draft. Katalyst is the instrument; the agent is the profiler.

The split is deliberate. Inspectors are deterministic and never recommend; deciding that a field present in 94% of files should be required, or that a directory should be a collection, is the agent’s call. Keep that division and the loop stays debuggable.

1. Give the agent the raw-store evidence#

Run inspect on the directory with --json so the agent gets structured records: one per inspector, each carrying the unit count n as the denominator:

katalyst inspect ./wiki --json

With no project this runs the raw-source layer: file_tree maps the store and file_content_shape summarizes selected-file content structure. Feed the output to the agent. Tell it the contract: every record is evidence, not a recommendation; it must choose its own thresholds and justify them.

2. Let the agent cluster, configure, and profile fields#

A capable agent then:

  1. Chooses collection boundaries from the raw-source evidence. file_tree shows the directory and naming map; file_content_shape shows whether an explicit slice shares frontmatter and body conventions. The agent names the collection and drafts .katalyst/storage/* pointing it at the chosen path.
  2. Profiles the fields by inspecting each new collection, katalyst inspect <collection> --json runs the collection layer, whose object_fields record is the per-field data dictionary (presence, types, values).
  3. Sets thresholds from that evidence, e.g. fields in ≥95% of items become required, a small stable value set becomes an enum, a consistent type becomes a type constraint, and drafts the .katalyst/schemas/*.

A prompt that works:

You are profiling a markdown wiki. Here is katalyst inspect --json output. Propose .katalyst/ schema and collection files. Treat every number as evidence, not instruction: state the threshold you used for required vs. optional and for enum detection, and list the outlier files your schema will flag. Do not invent fields the evidence does not show.

3. Check and iterate#

Have the agent run check against its draft and read the violations:

katalyst check books

The files that already conform pass; the outliers light up. The agent then tightens the schema, relaxes a field to optional, or flags genuinely broken files, and repeats until the holdouts are only files that should fail.

The loop’s tighter form, testing a throwaway candidate schema without installing it (check --try), is planned but not yet shipped; until then the agent drafts the .katalyst/ files and validates with the normal check.

See also#