Reconcile

Run a configured pair and understand the output flags that control format, streaming, and audit behavior.

Running a reconciliation is a single command once your config is valid. The command streams the left source against an indexed right source and emits every match, difference, and unmatched row to a file or stdout. The flags you choose here determine how the output is structured, how much memory the run uses, and whether the result is reproducible.

What you'll learn

By the end of this guide, you know:

how to run a basic reconciliation and write results to a file,
when to use streaming formats and why they matter for large files,
and how to override input files per run without editing your config.

How it works

Reconify indexes the right source first, then streams the left source row by row and emits events as it goes. See Overview for the full pipeline and Matching for how individual rows are classified.

Command

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_stripe \
  --out results.json

Steps

Run a basic reconciliation

--pair matches a key under pairs in your config. --out writes to a file. Omit it or use -o - to write to stdout.

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_stripe \
  --out results.json

Choose an output format

Pass --format to control how results are structured. The default is json.

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_stripe \
  --format ndjson \
  --out results.ndjson

Format	When to use
`json`	Default. Full result as a single JSON object. Good for files under ~500k rows.
`json-stream`	Each section emitted as a separate JSON object. Lower peak memory.
`ndjson`	One event per line. Best for piping into `jq` or downstream tools.
`csv`	Flat row-per-event. Best for loading into spreadsheets or databases.
`table`	Human-readable terminal output. Not suitable for machine consumption.

For files above roughly 500k rows, prefer json-stream, ndjson, or csv.

Override input files without editing the config

When you need to run a specific pair of files rather than the glob from your config:

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_stripe \
  --left-file data/bank/january.csv \
  --right-file data/stripe/january.csv \
  --out results.json

This is useful in CI pipelines where filenames include a date or run ID.

Enable progress logging for large runs

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_stripe \
  --format ndjson \
  --progress \
  --progress-every 1000000 \
  --out results.ndjson

Progress logs go to stderr, not the output file, so they don't interfere with piped output.

Control the token-match buffer

When name_mode: "tokens" is set in your pair config, unmatched rows are buffered for token matching after reference matching completes.

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_stripe \
  --max-token-buffer 100000

Set --max-token-buffer 0 for unlimited buffering, but only when you have enough memory for the full unmatched set.

Verify it worked

Open results.json (or pipe NDJSON through jq) and check the summary section first:

{
  "type": "summary",
  "matched_count": 842,
  "unmatched_left_count": 3,
  "unmatched_right_count": 1,
  "amount_diff_count": 0,
  "timing_diff_count": 2
}

A healthy run has high matched_count and low unmatched/diff counts. If unmatched_left_count or unmatched_right_count is unexpectedly high, read the unmatched rows and check for reference format mismatches, the most common cause. See Read Results for the full investigation workflow.

On this page