Profile an existing wiki by hand#

You have a directory of markdown: a vault, a docs tree, a knowledge base, and you want a Katalyst schema for it. Rather than guess the conventions, inspect measures them. This guide turns an existing corpus into a draft schema by reading the evidence yourself. To hand that judgment to an agent instead, see Profile an existing wiki with an agent.

inspect reports evidence, counts and distributions, never recommendations. Reading the evidence and deciding the schema is your call. It runs in two layers: point it at a directory to profile a raw store (no project needed), or at a configured collection to profile its items. The onboarding loop uses both.

1. Survey the directory (raw-source layer)#

Point inspect at the directory. With no .katalyst/ project it runs the raw-source inspectors:

katalyst inspect ./wiki

file_tree reports the file types and naming conventions per directory. Use it to decide which directory or prefix you want to inspect more closely. Then run file_content_shape over that explicit slice:

# Inspection report: ./wiki

## Structural

### file_content_shape (n=5)

_Profile selected files by text, tabular, and tree content structure._

----------------------------------------
selection:
  expression    : ext = ".md"
  files         : 5
  directories   : 1
  readable      : 5
  unsupported   : 0
  parse failures: 0

----------------------------------------
file types:
  TYPE  FILES
  .md   5

----------------------------------------
coherence:
  status: coherent

----------------------------------------
common structure:
  - 5/5 Markdown files have an H1
  - 4/5 Markdown files have frontmatter key author
  - 5/5 Markdown files have frontmatter key status
  - 5/5 Markdown files have frontmatter key title
  - 4/5 Markdown files have section Review

----------------------------------------
variation:
  - frontmatter key author appears in 4/5 Markdown files

----------------------------------------
text:
  files  : 5
  with H1: 5
  frontmatter keys:
  KEY     FILES
  status  5
  title   5
  author  4

----------------------------------------
tabular:
  no CSV files selected

----------------------------------------
tree:
  no JSON files selected

----------------------------------------
read/parse issues:
  none

This layer reports store and content facts, not candidate collections. Here the Markdown files share enough structure that you can reasonably treat ./wiki as a single books collection and keep the file with the missing author in mind as cleanup work.

2. Configure the collection#

Point a collection at the directory so the field-level layer can run. Minimal config:

# .katalyst/storage/local.yaml
type: filesystem
root: .
collections:
  books:
    path: wiki

3. Inspect the collection (collection layer)#

Now inspect the collection by name. Inside the project, inspect runs the collection inspectors over its items:

katalyst inspect books

object_fields is a data dictionary over the items’ frontmatter, per field, presence over n, observed types, value cardinality, and the common values when the set is small:

# Inspection report: books

## Object

### object_fields (n=5)

_A data dictionary over item frontmatter: per-field presence, types, cardinality, and common values._

- author:
  - cardinality: 4
  - present: 4
  - types:
    - string: 4
  - values:
    - Frank Herbert: 1
    - Isaac Asimov: 1
    - Neal Stephenson: 1
    - William Gibson: 1
- status:
  - cardinality: 3
  - present: 5
  - types:
    - string: 5
  - values:
    - read: 3
    - reading: 1
    - to-read: 1
- title:
  - cardinality: 5
  - present: 5
  - types:
    - string: 5
  - values:
    - Dune: 1
    - Dune Messiah: 1
    - Foundation: 1
    - Neuromancer: 1
    - Snow Crash: 1

markdown_body reports the body conventions: single-H1 / H1-matches-title rates and recurring section headings. For a machine-readable form, add --json; to save the report, use -o report.md.

4. Read the evidence#

Translate the counts into schema decisions yourself, the threshold is your judgment, not the tool’s:

Evidence	What it tells you	A reasonable reading
`object_fields` `present` / `n`	how often a field appears	nearly every item → `required`; sometimes → optional
`object_fields` `values`	a small, stable value set	an `enum`
`object_fields` `types`	observed types per field	one consistent type → a `type` constraint; mixed → a field to clean up first
`markdown_body` heading shape	single-H1, H1-matches-title	`markdown_single_h1`, `markdown_title_matches_h1`
`markdown_body` sections	recurring section headings	a `markdown_required_section`
`file_tree` naming (step 1)	casing, spaces, extensions	`filesystem_name_case` (`style: kebab`), `filesystem_path_charset` (`deny: [" "]`)
`file_content_shape` common structure (step 1)	shared frontmatter keys and sections in the selected slice	confidence that the slice is coherent enough to configure as one collection

The denominator n is always reported, so you decide what “nearly every item” means. The one item missing author, which also has spaces in its name, is exactly the kind of file a schema will flag.

5. Draft a schema and check#

Add the schema and bind it to the collection:

# .katalyst/schemas/book.yaml
type: object
required: [title, author, status]
properties:
  title:  { type: string }
  author: { type: string }
  status: { enum: [read, reading, to-read] }

# .katalyst/storage/local.yaml  (extend the collection from step 2)
type: filesystem
root: .
collections:
  books:
    path: wiki
    schema: book
    checks:
      - kind: markdown_single_h1
      - kind: filesystem_name_case
        style: kebab

See Add a schema for the binding details. Then run check against the draft:

katalyst check books

The files that already follow the conventions pass; the outliers the evidence flagged light up as violations. From there you tighten the schema, relax a field to optional, or fix the stray files, then re-run. That loop, inspect → draft → check → fix the holdouts, is the whole onboarding.