Add enroll harvest

Miguel Jacq 2025-12-17 22:32:48 -06:00
parent 7fbbd29f6a
commit ca2a278241

189
enroll-harvest.md Normal file

@ -0,0 +1,189 @@
# enroll harvest
Harvest system/service/package/config/user state from a Debian host into a “harvest bundle” (`state.json` plus harvested file artifacts).
---
## Synopsis
```bash
enroll harvest [--out <DIR|FILE>] [--dangerous] [--sops <GPG_FPR...>] [--remote-host <HOST> [--remote-user <USER>] [--remote-port <PORT>] [--no-sudo]]
```
---
## What it produces
A harvest bundle always contains:
- `state.json` — structured snapshot of discovered services/packages/users/files
- `artifacts/` — file copies referenced from `state.json` (per-role subtrees)
### Output formats
**Plain (directory)**
- Output is a directory containing `state.json` and `artifacts/`.
**SOPS (single encrypted file)**
- Output is a single SOPS-encrypted tarball: `harvest.tar.gz.sops`.
- Internally, it contains the same layout (`state.json`, `artifacts/`), just bundled and encrypted.
---
## Options
### `--out <path>`
Where to write the harvest output.
Behavior depends on whether youre in **plain** or **SOPS** mode:
- **Plain mode (no `--sops`)**
- `--out` is a **directory**.
- **Required for local harvests.**
- Optional for remote harvests (see “Cache defaults” below).
- **SOPS mode (`--sops ...`)**
- `--out` may be:
- a **directory** → the file `harvest.tar.gz.sops` is created inside it
- a **file path** → that exact file is written
- If omitted, `enroll` writes into a secure per-user cache dir (see below).
### `--dangerous`
Harvest files more aggressively.
This disables the built-in “likely secret” safety checks, including:
- denylisted paths (e.g. `/etc/shadow`, `/etc/ssl/private/*`, `/etc/ssh/ssh_host_*`, `/etc/letsencrypt/*`)
- heuristic content scanning for common secret patterns (private keys, “password=”, “token”, “secret”, etc.)
- some other conservative skipping logic
**Use with care**, especially in plaintext mode.
### `--sops <GPG_FINGERPRINT...>`
Encrypt the harvest output as a SOPS-encrypted tarball.
- Provide **one or more** GPG fingerprints.
- Requires `sops` available on `PATH`.
- Output becomes a single file: `harvest.tar.gz.sops` (unless you choose another filename via `--out`).
### Remote harvesting
#### `--remote-host <host>`
Run the harvest on a remote host over SSH and pull the results locally.
When `--remote-host` is set:
- the harvest is executed on the remote machine
- the bundle is written locally (in `--out` or the cache directory)
#### `--remote-user <user>`
SSH username. Default is the local `$USER`.
#### `--remote-port <port>`
SSH port. Default is `22`.
#### `--no-sudo`
Dont use sudo on the remote host.
This may cause a **partial harvest** (missing files/metadata) if the SSH user cant read everything.
---
## Cache defaults (when `--out` is omitted)
`enroll` has a “secure cache” feature for harvest output, but it only applies in specific cases:
- **Remote harvest (plain mode)**: if `--out` is omitted, a cache dir is created and used.
- **Any harvest with `--sops`** (local or remote): if `--out` is omitted, a cache dir is created and used.
The cache base is:
- `$XDG_CACHE_HOME/enroll/harvest/` if `XDG_CACHE_HOME` is set
- otherwise `~/.local/cache/enroll/harvest/`
Each run gets a timestamped directory with an unpredictable suffix, e.g.
`~/.local/cache/enroll/harvest/20251218-...-<random>/`.
**Note:** Local plaintext harvests **require** `--out`.
---
## Runtime notes / expectations
- **Root recommended:** If not running as root (or remote sudo is disabled), `enroll` may miss files or correct ownership/mode metadata.
- **Symlinks/binaries/large files:** Harvesting skips files that are symlinks, “binary-like”, or above a size cap (unless you use `--dangerous`, which relaxes some checks).
- **Output is deterministic-enough to diff:** The bundle is designed so comparing two harvests is meaningful (via `enroll diff`).
---
## Permutations (valid combinations)
Below are the common “flag permutations” youll typically use.
### Local harvest, plaintext (safe)
```bash
enroll harvest --out /path/to/harvest-dir
```
### Local harvest, plaintext (`--dangerous`)
```bash
enroll harvest --out /path/to/harvest-dir --dangerous
```
### Local harvest, SOPS-encrypted (safe)
`--out` may be a **dir**:
```bash
enroll harvest --sops <FPR1> --out /path/to/output-dir
# writes /path/to/output-dir/harvest.tar.gz.sops
```
…or a **file**:
```bash
enroll harvest --sops <FPR1> --out /path/to/harvest.tar.gz.sops
```
If you omit `--out`, it writes into the per-user cache:
```bash
enroll harvest --sops <FPR1>
```
### Local harvest, SOPS-encrypted (`--dangerous`)
```bash
enroll harvest --dangerous --sops <FPR1> --out /path/to/output-dir
```
---
### Remote harvest, plaintext (safe)
With explicit output dir:
```bash
enroll harvest --remote-host host.example.com --out /path/to/harvest-dir
```
Using the cache (omit `--out`):
```bash
enroll harvest --remote-host host.example.com
```
### Remote harvest, plaintext (`--dangerous`)
```bash
enroll harvest --remote-host host.example.com --out /path/to/harvest-dir --dangerous
```
### Remote harvest, plaintext without sudo
```bash
enroll harvest --remote-host host.example.com --out /path/to/harvest-dir --no-sudo
```
### Remote harvest, SOPS-encrypted (safe)
```bash
enroll harvest --remote-host host.example.com --sops <FPR1> --out /path/to/output-dir
# writes /path/to/output-dir/harvest.tar.gz.sops
```
### Remote harvest, SOPS-encrypted (`--dangerous`)
```bash
enroll harvest --remote-host host.example.com --dangerous --sops <FPR1> --out /path/to/output-dir
```
### Remote harvest, SOPS-encrypted without sudo
```bash
enroll harvest --remote-host host.example.com --no-sudo --sops <FPR1> --out /path/to/output-dir
```