From ca2a278241d03e0769a1f2552c95100db81a9967 Mon Sep 17 00:00:00 2001 From: Miguel Jacq Date: Wed, 17 Dec 2025 22:32:48 -0600 Subject: [PATCH] Add enroll harvest --- enroll-harvest.md | 189 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 189 insertions(+) create mode 100644 enroll-harvest.md diff --git a/enroll-harvest.md b/enroll-harvest.md new file mode 100644 index 0000000..fb247b6 --- /dev/null +++ b/enroll-harvest.md @@ -0,0 +1,189 @@ +# enroll harvest + +Harvest system/service/package/config/user state from a Debian host into a “harvest bundle” (`state.json` plus harvested file artifacts). + +--- + +## Synopsis + +```bash +enroll harvest [--out ] [--dangerous] [--sops ] [--remote-host [--remote-user ] [--remote-port ] [--no-sudo]] +``` + +--- + +## What it produces + +A harvest bundle always contains: + +- `state.json` — structured snapshot of discovered services/packages/users/files +- `artifacts/` — file copies referenced from `state.json` (per-role subtrees) + +### Output formats + +**Plain (directory)** +- Output is a directory containing `state.json` and `artifacts/`. + +**SOPS (single encrypted file)** +- Output is a single SOPS-encrypted tarball: `harvest.tar.gz.sops`. +- Internally, it contains the same layout (`state.json`, `artifacts/`), just bundled and encrypted. + +--- + +## Options + +### `--out ` +Where to write the harvest output. + +Behavior depends on whether you’re in **plain** or **SOPS** mode: + +- **Plain mode (no `--sops`)** + - `--out` is a **directory**. + - **Required for local harvests.** + - Optional for remote harvests (see “Cache defaults” below). + +- **SOPS mode (`--sops ...`)** + - `--out` may be: + - a **directory** → the file `harvest.tar.gz.sops` is created inside it + - a **file path** → that exact file is written + - If omitted, `enroll` writes into a secure per-user cache dir (see below). + +### `--dangerous` +Harvest files more aggressively. + +This disables the built-in “likely secret” safety checks, including: +- denylisted paths (e.g. `/etc/shadow`, `/etc/ssl/private/*`, `/etc/ssh/ssh_host_*`, `/etc/letsencrypt/*`) +- heuristic content scanning for common secret patterns (private keys, “password=”, “token”, “secret”, etc.) +- some other conservative skipping logic + +**Use with care**, especially in plaintext mode. + +### `--sops ` +Encrypt the harvest output as a SOPS-encrypted tarball. + +- Provide **one or more** GPG fingerprints. +- Requires `sops` available on `PATH`. +- Output becomes a single file: `harvest.tar.gz.sops` (unless you choose another filename via `--out`). + +### Remote harvesting + +#### `--remote-host ` +Run the harvest on a remote host over SSH and pull the results locally. + +When `--remote-host` is set: +- the harvest is executed on the remote machine +- the bundle is written locally (in `--out` or the cache directory) + +#### `--remote-user ` +SSH username. Default is the local `$USER`. + +#### `--remote-port ` +SSH port. Default is `22`. + +#### `--no-sudo` +Don’t use sudo on the remote host. + +This may cause a **partial harvest** (missing files/metadata) if the SSH user can’t read everything. + +--- + +## Cache defaults (when `--out` is omitted) + +`enroll` has a “secure cache” feature for harvest output, but it only applies in specific cases: + +- **Remote harvest (plain mode)**: if `--out` is omitted, a cache dir is created and used. +- **Any harvest with `--sops`** (local or remote): if `--out` is omitted, a cache dir is created and used. + +The cache base is: +- `$XDG_CACHE_HOME/enroll/harvest/` if `XDG_CACHE_HOME` is set +- otherwise `~/.local/cache/enroll/harvest/` + +Each run gets a timestamped directory with an unpredictable suffix, e.g. +`~/.local/cache/enroll/harvest/20251218-...-/`. + +**Note:** Local plaintext harvests **require** `--out`. + +--- + +## Runtime notes / expectations + +- **Root recommended:** If not running as root (or remote sudo is disabled), `enroll` may miss files or correct ownership/mode metadata. +- **Symlinks/binaries/large files:** Harvesting skips files that are symlinks, “binary-like”, or above a size cap (unless you use `--dangerous`, which relaxes some checks). +- **Output is deterministic-enough to diff:** The bundle is designed so comparing two harvests is meaningful (via `enroll diff`). + +--- + +## Permutations (valid combinations) + +Below are the common “flag permutations” you’ll typically use. + +### Local harvest, plaintext (safe) +```bash +enroll harvest --out /path/to/harvest-dir +``` + +### Local harvest, plaintext (`--dangerous`) +```bash +enroll harvest --out /path/to/harvest-dir --dangerous +``` + +### Local harvest, SOPS-encrypted (safe) +`--out` may be a **dir**: +```bash +enroll harvest --sops --out /path/to/output-dir +# writes /path/to/output-dir/harvest.tar.gz.sops +``` + +…or a **file**: +```bash +enroll harvest --sops --out /path/to/harvest.tar.gz.sops +``` + +If you omit `--out`, it writes into the per-user cache: +```bash +enroll harvest --sops +``` + +### Local harvest, SOPS-encrypted (`--dangerous`) +```bash +enroll harvest --dangerous --sops --out /path/to/output-dir +``` + +--- + +### Remote harvest, plaintext (safe) +With explicit output dir: +```bash +enroll harvest --remote-host host.example.com --out /path/to/harvest-dir +``` + +Using the cache (omit `--out`): +```bash +enroll harvest --remote-host host.example.com +``` + +### Remote harvest, plaintext (`--dangerous`) +```bash +enroll harvest --remote-host host.example.com --out /path/to/harvest-dir --dangerous +``` + +### Remote harvest, plaintext without sudo +```bash +enroll harvest --remote-host host.example.com --out /path/to/harvest-dir --no-sudo +``` + +### Remote harvest, SOPS-encrypted (safe) +```bash +enroll harvest --remote-host host.example.com --sops --out /path/to/output-dir +# writes /path/to/output-dir/harvest.tar.gz.sops +``` + +### Remote harvest, SOPS-encrypted (`--dangerous`) +```bash +enroll harvest --remote-host host.example.com --dangerous --sops --out /path/to/output-dir +``` + +### Remote harvest, SOPS-encrypted without sudo +```bash +enroll harvest --remote-host host.example.com --no-sudo --sops --out /path/to/output-dir +```