Archived

This repository has been archived on 2026-06-22. You can view files and clone it, but you cannot make any changes to it's state, such as pushing and creating new issues, pull requests or comments.

Miguel Jacq 90e863df40

Add DEVELOPMENT.md

2026-06-21 13:03:26 +10:00

63 KiB

Raw Blame History

Enroll Development Guide

Interested in the internals of Enroll?

This guide describes the current enroll codebase for maintainers. It focuses on how the project is organised, what calls what, how harvest state flows into generated configuration-management output, and which invariants matter when changing the code.

1. What Enroll does

enroll is a Linux host inspection and configuration-management generation tool.

Its core pipeline is:

Running Linux host
  |
  | enroll harvest
  v
Harvest bundle
  state.json
  artifacts/<role>/<path-relative-to-root>
  |
  | enroll manifest --target ansible|puppet|salt
  v
Generated configuration-management output
  Ansible roles/playbook
  Puppet modules/site.pp/Hiera data
  Salt states/pillar data

The harvest bundle is deliberately target-neutral. Ansible, Puppet, and Salt renderers all consume the same state.json shape and the same harvested artifacts. Renderer code should translate harvest state into the target's idioms; it should not invent source facts that belong in the harvest.

enroll diff is also built around harvest bundles. It compares two harvests and, when --enforce is requested, can generate a temporary manifest from the old harvest and apply it locally with the selected target:

enroll diff --old ./baseline --new ./current --enforce --target ansible
enroll diff --old ./baseline --new ./current --enforce --target puppet
enroll diff --old ./baseline --new ./current --enforce --target salt

For enforcement, the user is responsible for having the chosen local apply tool on PATH: ansible-playbook, puppet, or salt-call.

2. Repository layout

The project is a single Python package under enroll/ with tests under tests/.

enroll/
  __main__.py                 python -m enroll entry point
  cli.py                      argparse CLI and subcommand dispatcher
  version.py                  package version lookup

  harvest.py                  top-level local harvest orchestration and runtime helpers
  harvest_types.py            dataclasses persisted into state.json
  harvest_collectors/         feature-specific collectors used by harvest.py
    context.py                HarvestContext and HarvestCollector base
    runtime.py                root-only runtime state collector wrapper
    cron_logrotate.py         cron/logrotate unification collector
    services.py               systemd service + manual package collector
    users.py                  users, SSH public files, Flatpak, Snap collector
    package_manager.py        apt/dnf/yum config collectors
    container_images.py       Docker/Podman image collector
    paths.py                  /usr/local and --include-path collectors

  manifest.py                 target router and SOPS manifest wrapper
  ansible.py                  Ansible renderer
  puppet.py                   Puppet renderer
  salt.py                     Salt renderer
  cm.py                       renderer-neutral CMModule model and grouping helpers
  role_names.py               reserved singleton role-name protection

  accounts.py                 users, SSH public files, Flatpak and Snap discovery
  platform.py                 OS/package-backend abstraction
  debian.py                   dpkg/apt helpers
  rpm.py                      rpm/dnf/yum helpers
  systemd.py                  systemctl wrappers and parsers
  system_paths.py             known config paths and filesystem scanners
  package_hints.py            service/package name and config attribution helpers

  capture.py                  safe file/symlink capture into artifacts/
  fsutil.py                   file md5 + owner/group/mode helpers
  ignore.py                   secret/noise avoidance policy
  pathfilter.py               --include-path / --exclude-path matching and expansion
  state.py                    state.json load/write helpers
  yamlutil.py                 YAML helpers used by renderers/JinjaTurtle
  jinjaturtle.py              optional config-file templating integration

  diff.py                     harvest comparison, notifications, and target-selected enforcement
  explain.py                  human/JSON explanation of harvest contents
  validate.py                 schema and artifact consistency validation
  remote.py                   Paramiko remote harvest implementation
  cache.py                    secure local cache directories for harvests
  sopsutil.py                 SOPS binary encryption/decryption helpers
  schema/state.schema.json    JSON Schema for harvest state

tests/
  test_*.py                   unit tests grouped mostly by module/feature

The installed command is configured in pyproject.toml:

[tool.poetry.scripts]
enroll = "enroll.cli:main"

python -m enroll calls the same CLI through enroll/__main__.py.

3. Main runtime flows

3.1 CLI entry flow

All user-facing commands enter through enroll.cli.main().

enroll command
  -> enroll.cli.main()
     -> builds argparse parser and subparsers
     -> discovers optional INI config file
     -> injects config-derived argv defaults before user argv
     -> parses final argv
     -> dispatches by args.cmd

The supported subcommands are:

harvest       collect a harvest bundle from a local or remote host
manifest      generate Ansible/Puppet/Salt output from a harvest bundle
single-shot   run harvest and manifest in one command
diff          compare two harvest bundles and optionally enforce old state
explain       produce a human/JSON explanation of a harvest
validate      validate state.json and referenced artifacts

cli.py should stay orchestration-heavy, not domain-heavy. It should parse flags, handle config/SOPS/remote branching, and then call the relevant module. It should not contain the meaning of a service, package, user, file, renderer resource, or harvest snapshot.

3.2 Subcommand call graph

flowchart TD
  A[enroll.cli.main] --> B{args.cmd}
  B -->|harvest local| C[harvest.harvest]
  B -->|harvest remote| D[remote.remote_harvest]
  B -->|manifest| E[manifest.manifest]
  B -->|single-shot local| C
  B -->|single-shot remote| D
  C --> E
  D --> E
  B -->|diff| F[diff.compare_harvests]
  F --> G[diff.format_report]
  F --> H{--enforce?}
  H -->|yes| I[diff.enforce_old_harvest]
  I --> J[manifest.manifest target=ansible|puppet|salt]
  J --> K[ansible-playbook or puppet apply or salt-call]
  B -->|explain| L[explain.explain_state]
  B -->|validate| M[validate.validate_harvest]

Important dependency direction:

cli.py
  depends on harvest.py, manifest.py, diff.py, explain.py, validate.py, remote.py

harvest.py
  depends on harvest_collectors, platform backends, capture policy, system scanners

manifest.py
  depends on ansible.py, puppet.py, salt.py

ansible.py / puppet.py / salt.py
  depend on state.py, cm.py, harvested artifacts, and target-specific helpers

4. Harvest bundles

A plaintext harvest bundle is a directory:

<bundle>/
  state.json
  artifacts/
    <role_name>/
      etc/...
      usr/local/...
      sysctl/...
      firewall/...

state.json is written by enroll.state.write_state() and loaded by enroll.state.load_state().

The renderer relies on this invariant:

state.json roles.*.managed_files[*].src_rel
  must correspond to
artifacts/<artifact_role>/<src_rel>

For example, a captured /etc/nginx/nginx.conf in role nginx normally becomes:

{
  "path": "/etc/nginx/nginx.conf",
  "src_rel": "etc/nginx/nginx.conf",
  "owner": "root",
  "group": "root",
  "mode": "0644",
  "reason": "modified_conffile"
}

and the artifact is copied to:

artifacts/nginx/etc/nginx/nginx.conf

Renderer role/module names can differ from artifact roles, especially when common grouping is enabled. Copy helpers must therefore pass the original artifact role, not blindly use the generated renderer module name.

5. `state.json` shape and snapshot dataclasses

The top-level state assembled by harvest.harvest() is:

{
  "enroll": {
    "version": "...",
    "harvest_time": 123456789
  },
  "host": {
    "hostname": "...",
    "os": "debian|redhat|unknown",
    "pkg_backend": "dpkg|rpm|unknown",
    "os_release": {}
  },
  "inventory": {
    "packages": {}
  },
  "roles": {
    "users": {},
    "flatpak": {},
    "snap": {},
    "container_images": {},
    "services": [],
    "packages": [],
    "apt_config": {},
    "dnf_config": {},
    "firewall_runtime": {},
    "sysctl": {},
    "etc_custom": {},
    "usr_local_custom": {},
    "extra_paths": {}
  }
}

The persisted in-memory shapes live in enroll/harvest_types.py.

Dataclass	Purpose
`ManagedFile`	A file to recreate, with destination path, artifact path, owner, group, mode, and reason.
`ManagedLink`	A symlink to recreate, such as `sites-enabled` entries.
`ManagedDir`	A directory to ensure exists, with owner/group/mode.
`ExcludedFile`	A path that was considered but skipped, with a reason.
`ServiceSnapshot`	One enabled systemd service and its packages/config/state.
`PackageSnapshot`	One manual package and related config. `has_config=False` is used when the package should still be installed but no config was found.
`UsersSnapshot`	Human users, groups, managed SSH/dotfiles, and per-user Flatpak data.
`FlatpakSnapshot`	System Flatpaks and system Flatpak remotes.
`SnapSnapshot`	System Snap installs.
`ContainerImagesSnapshot`	Docker/Podman image metadata.
`AptConfigSnapshot` / `DnfConfigSnapshot`	Package-manager configuration.
`EtcCustomSnapshot`	Unowned/custom `/etc` config not attributed elsewhere.
`UsrLocalCustomSnapshot`	Selected `/usr/local/etc` files and executable `/usr/local/bin` files.
`ExtraPathsSnapshot`	User-requested `--include-path` files/directories.
`FirewallRuntimeSnapshot`	Generated artifacts from live ipset/iptables state.
`SysctlSnapshot`	Generated `/etc/sysctl.d/99-enroll.conf` from live writable sysctls.

The JSON Schema in enroll/schema/state.schema.json is the validation contract for persisted harvests.

6. Harvest orchestration

The local harvest entry point is:

enroll.harvest.harvest(
    bundle_dir,
    policy=None,
    dangerous=False,
    include_paths=None,
    exclude_paths=None,
)

It returns the path to the written state.json.

6.1 High-level harvest order

The order matters because harvest maintains a global set of captured destination paths. Once a path is captured into one role, later collectors normally skip it.

flowchart TD
  A[harvest.harvest] --> B[Build IgnorePolicy and PathFilter]
  B --> C[detect_platform + get_backend]
  C --> D[backend.build_etc_index]
  D --> E[RuntimeStateCollector]
  E --> F[CronLogrotateCollector]
  F --> G[ServicePackageCollector]
  G --> H[UsersCollector]
  H --> I[ContainerImagesCollector]
  I --> J[PackageManagerConfigCollector]
  J --> K[etc_custom scan inside harvest.py]
  K --> L[UsrLocalCustomCollector]
  L --> M[ExtraPathsCollector]
  M --> N[Build inventory.packages]
  N --> O[Add parent ManagedDir entries]
  O --> P[state.write_state]

6.2 `HarvestContext`

HarvestContext lives in harvest_collectors/context.py. It is passed to collectors instead of passing many individual dependencies.

@dataclass
class HarvestContext:
    bundle_dir: str
    policy: IgnorePolicy
    path_filter: PathFilter
    platform: Dict[str, Any]
    backend: Any
    installed_pkgs: Dict[str, Any]
    installed_names: Set[str]
    owned_etc: Set[str]
    etc_owner_map: Dict[str, str]
    topdir_to_pkgs: Dict[str, Set[str]]
    pkg_to_etc_paths: Dict[str, List[str]]
    captured_global: Set[str]

New collectors should generally accept a HarvestContext and return dataclass snapshots from harvest_types.py.

6.3 Global de-duplication

The harvester tries to avoid two generated roles owning the same destination path. This avoids duplicate config-manager resources and confusing diffs.

captured_global is passed into capture.capture_file() and capture.capture_link(). If a destination path has already been seen, later collection attempts return without capturing it again.

This is one of the most important invariants in the project:

A destination path should normally appear in only one generated role.

Puppet and Salt also run cm.resolve_catalog_conflicts() after renderer role collection because they compile a single global catalog and duplicate resources are hard failures.

7. File capture and safety policy

7.1 `capture_file()`

capture.capture_file() decides whether to copy a file into artifacts/ and record it in a snapshot.

capture_file(abs_path, role_name, reason, policy, path_filter, ...)
  -> skip if already seen globally or in this role
  -> skip if --exclude-path matches
  -> ask IgnorePolicy.deny_reason(abs_path)
  -> stat owner/group/mode with fsutil.stat_triplet()
  -> copy to artifacts/<role_name>/<abs_path without leading slash>
  -> append ManagedFile
  -> mark seen in role/global

fsutil.stat_triplet() returns owner, group, and a zero-padded octal mode string. It falls back to numeric uid/gid strings if user/group names cannot be resolved.

7.2 `capture_link()`

capture.capture_link() records symlinks as ManagedLink entries rather than copying their targets. It is used for meaningful enablement symlinks, especially in nginx/apache-style trees such as:

/etc/nginx/sites-enabled/*
/etc/nginx/modules-enabled/*
/etc/apache2/conf-enabled/*
/etc/apache2/mods-enabled/*
/etc/apache2/sites-enabled/*

7.3 User shell dotfiles

capture.capture_user_shell_dotfiles() is called by UsersCollector, but only enabled when the harvest policy is dangerous.

In dangerous mode:

.bashrc, .profile, and .bash_logout are captured only if they differ from /etc/skel baselines.
.bash_aliases is captured if present because there may be no skel baseline.

Outside dangerous mode, Enroll records a note explaining that shell dotfiles were not auto-harvested. Users can still include specific files via --include-path, but the normal IgnorePolicy still applies unless --dangerous is also used.

7.4 `IgnorePolicy`

ignore.IgnorePolicy is the default secret/noise avoidance layer.

By default it skips likely sensitive or low-value files such as:

/etc/shadow, /etc/gshadow, and backup variants,
SSH host private keys,
private SSL/Let's Encrypt material,
log files and editor backups,
files larger than max_file_bytes (256_000 by default),
binary-like files except known keyring formats,
sampled non-comment content that looks sensitive, such as private keys, password=, token, secret, or api_key.

--dangerous sets policy.dangerous = True, disabling deny-globs and content sniffing. This is intentional and should remain explicit.

The policy has separate methods for different filesystem types:

deny_reason(path) for regular files,
deny_reason_dir(path) for directories,
deny_reason_link(path) for symlinks.

7.5 `PathFilter`

pathfilter.PathFilter implements user-supplied path controls:

--include-path adds extra files/directories to the extra_paths role.
--exclude-path removes matching paths from all harvesting.
Excludes always win over includes.

Pattern styles:

/plain/path        exact path or directory-prefix match
glob:/path/**/*.x  forced glob
/path/**/*.x       inferred glob because it contains glob characters
re:^/path/...$     regex
regex:^/path/...$  regex

expand_includes() is conservative: it ignores symlinks, respects excludes, caps file counts, and returns notes for unmatched patterns or caps.

8. Platform and package backends

platform.py abstracts distribution-specific package behaviour.

platform.detect_platform()
  -> reads /etc/os-release
  -> returns PlatformInfo(os_family, pkg_backend, os_release)

platform.get_backend(info)
  -> DpkgBackend for Debian-like systems
  -> RpmBackend for RedHat/Fedora-like systems

The backend interface is PackageBackend:

owner_of_path(path)
list_manual_packages()
installed_packages()
build_etc_index()
specific_paths_for_hints()
is_pkg_config_path(path)
modified_paths(pkg, paths)

8.1 Debian backend

DpkgBackend delegates to debian.py.

It uses dpkg/apt data to provide package ownership, manual package lists, installed package inventory, /etc indexes, conffile hashes, and packaged-file md5 baselines.

DpkgBackend.modified_paths() identifies:

modified_conffile when a dpkg conffile hash differs,
modified_packaged_file when a packaged file md5 differs.

It deliberately leaves /etc/apt-style package-manager configuration for the apt_config role.

8.2 RPM backend

RpmBackend delegates to rpm.py.

It provides package ownership, manual package lists, installed package inventory, /etc indexes, RPM config file lists, and rpm -V style modified-file detection.

RPM-family package-manager config paths such as /etc/dnf, /etc/yum, /etc/yum.conf, /etc/yum.repos.d, and /etc/pki/rpm-gpg are collected into dnf_config, not arbitrary package roles.

8.3 Adding a new package backend

To support another package system:

implement a PackageBackend subclass,
route it from platform.get_backend(),
provide ownership lookup, manual package listing, installed package inventory, /etc indexing, modified config detection, and package-manager config exclusion,
add backend tests comparable to test_debian.py, test_rpm.py, and test_platform.py.

9. Harvest collectors in detail

Collectors live under enroll/harvest_collectors/.

9.1 `RuntimeStateCollector`

File: harvest_collectors/runtime.py

This wrapper collects root-only live runtime state:

writable sysctl state,
live ipset state,
live IPv4 iptables state,
live IPv6 iptables state.

The actual helper implementations currently live in harvest.py:

_collect_sysctl_snapshot(),
_collect_firewall_runtime_snapshot(),
_parse_sysctl_a_output(),
_iptables_save_has_state(),
_ipset_save_has_state().

If the process is not root, runtime capture returns empty snapshots with explanatory notes.

Sysctl capture

Sysctl capture runs sysctl -a, filters to writable/persistable single-line keys, and writes a generated artifact:

artifacts/sysctl/sysctl/99-enroll.conf

The destination managed by renderers is:

/etc/sysctl.d/99-enroll.conf

The filter skips volatile/action/identity keys and inactive mutually-exclusive zero values. This avoids generating config that fails or is noisy on replay.

Firewall runtime capture

Runtime firewall capture is a fallback. Enroll first checks for persistent firewall config such as:

/etc/iptables/rules.v4
/etc/iptables/rules.v6
/etc/sysconfig/iptables
/etc/sysconfig/ip6tables
/etc/ipset.conf
/etc/ipset/*

If persistent files exist for a family, live runtime capture for that family is skipped. If no persistent file exists and live state is meaningful, Enroll writes generated artifacts such as:

artifacts/firewall_runtime/firewall/ipset.save
artifacts/firewall_runtime/firewall/iptables.v4
artifacts/firewall_runtime/firewall/iptables.v6

Renderers should only create a firewall runtime role when at least one runtime artifact exists. When firewall runtime is rendered, Ansible/Puppet/Salt also create an enroll_runtime role/module/state to own /etc/enroll before /etc/enroll/firewall.

9.2 `CronLogrotateCollector`

File: harvest_collectors/cron_logrotate.py

This collector runs before service/package collection to prevent cron and logrotate snippets from being scattered across unrelated roles.

It detects cron packages such as cron, cronie, cronie-anacron, vixie-cron, and fcron, and detects logrotate separately.

It captures cron-related paths such as:

/etc/crontab
/etc/cron.d/*
/etc/cron.hourly/*
/etc/cron.daily/*
/var/spool/cron/*
/var/spool/crontabs/*
/var/spool/anacron/*

It captures logrotate paths such as:

/etc/logrotate.conf
/etc/logrotate.d/*

It returns PackageSnapshot objects for cron and logrotate when those packages exist.

9.3 `ServicePackageCollector`

File: harvest_collectors/services.py

This collector produces:

ServiceSnapshot objects for enabled systemd services,
PackageSnapshot objects for manual packages not already covered by services,
alias maps used by later /etc attribution,
seen_by_role state reused by later collectors.

For each enabled service it:

derives a safe role name from the unit,
queries systemd metadata,
infers packages from the unit fragment owner, ExecStart, and related /etc topdirs,
collects unit drop-ins, environment files, distro-specific likely config files, and modified package-owned config,
collects related unowned /etc/<hint> and /etc/<hint>.d files,
captures candidates with capture_file(),
builds a ServiceSnapshot.

It also collects timer override files. If a timer triggers a known service, timer files are attached to that service snapshot. Otherwise, the timer is associated with inferred packages.

Manual packages are processed after services. Packages already covered by service snapshots are not duplicated as standalone package roles. Packages with no detected config are still represented with has_config=False so renderers can install them.

Known enablement symlinks for nginx/apache are captured as ManagedLink entries at the end of the collector.

9.4 `UsersCollector`

File: harvest_collectors/users.py

This collector returns a UsersCollection containing:

UsersSnapshot,
FlatpakSnapshot,
SnapSnapshot.

User discovery is in accounts.collect_non_system_users(). It reads /etc/login.defs, /etc/passwd, /etc/group, home directories, and user Flatpak installs. It filters out users below UID_MIN, root, nobody, and non-login shells such as nologin and /bin/false.

Default user file capture is intentionally narrow:

authorized_keys,
safe public SSH material where supported by helpers.

Automatic shell dotfile capture only runs in dangerous mode.

The same collector discovers:

system Flatpaks,
system Flatpak remotes,
per-user Flatpaks,
per-user Flatpak remotes,
system Snaps.

9.5 `ContainerImagesCollector`

File: harvest_collectors/container_images.py

This collector inspects Docker and Podman image caches when the relevant engine exists.

For each engine it:

runs <engine> image ls -q --no-trunc,
inspects images in chunks with <engine> image inspect ...,
normalises image IDs, tags, digests, OS/architecture/platform fields, and tag aliases,
prefers digest-pinned pull refs from RepoDigests.

Renderers only enforce exact pull state for images with a usable digest. Images with only local tags and no digest are represented with notes rather than fake reproducibility.

9.6 `PackageManagerConfigCollector`

File: harvest_collectors/package_manager.py

This collector emits a dedicated package-manager config snapshot:

apt_config on dpkg systems,
dnf_config on rpm systems.

APT capture includes /etc/apt, sources, .sources files, trusted keyrings, and keyrings referenced through signed-by / Signed-By.

DNF/YUM capture includes /etc/dnf, /etc/yum, /etc/yum.conf, /etc/yum.repos.d/*.repo, and /etc/pki/rpm-gpg/*.

9.7 `etc_custom` scan

etc_custom is still assembled inside harvest.harvest() rather than in its own collector.

It captures:

essential system config from system_paths.iter_system_capture_paths(),
remaining unowned config-like files found by walking /etc.

Before adding shared snippets such as /etc/logrotate.d/* or /etc/cron.d/* to etc_custom, _target_role_for_shared_snippet() tries to attach them to a more meaningful service/package role.

9.8 `UsrLocalCustomCollector`

File: harvest_collectors/paths.py

This collector creates usr_local_custom from:

files under /usr/local/etc,
executable files under /usr/local/bin.

It respects IgnorePolicy, PathFilter, and global de-duplication.

9.9 `ExtraPathsCollector`

File: harvest_collectors/paths.py

This collector handles --include-path and --exclude-path and creates extra_paths.

For included directories, it records directory metadata as ManagedDir entries while walking. For included files, it relies on expand_includes() and then capture_file().

10. Path scanners and package hints

system_paths.py contains known path lists and filesystem scanners.

Important functions and constants:

ALLOWED_UNOWNED_EXTS decides which unowned /etc files look config-like.
MAX_FILES_CAP and MAX_UNOWNED_FILES_PER_ROLE cap broad scans.
is_confish() checks whether a path looks like configuration.
scan_unowned_under_roots() finds unowned files under candidate roots.
iter_matching_files() expands glob specs and walks directory hits.
iter_apt_capture_paths() and iter_dnf_capture_paths() collect package-manager config.
iter_system_capture_paths() returns fixed essential system config candidates.
persistent_ipset_globs(), persistent_iptables_v4_globs(), and persistent_iptables_v6_globs() support runtime firewall fallback decisions.

package_hints.py turns package/unit names into stable role names and attempts to infer relationships.

Important helpers:

safe_name(),
role_id(),
role_name_from_unit(),
role_name_from_pkg(),
package_section_from_installations(),
hint_names(),
add_pkgs_from_etc_topdirs(),
maybe_add_specific_paths().

SHARED_ETC_TOPDIRS in package_hints.py prevents shared directories such as /etc/default, /etc/pam.d, /etc/systemd, /etc/ssh, /etc/apt, and /etc/dnf from being attributed too broadly to one package.

role_names.py protects singleton role names such as users, flatpak, snap, container_images, apt_config, dnf_config, firewall_runtime, sysctl, etc_custom, usr_local_custom, and extra_paths from collisions with package/service-derived roles.

11. Manifest orchestration

manifest.py is a target router and SOPS wrapper. It does not render target resources itself.

Entry point:

manifest(
    bundle_dir,
    out,
    fqdn=None,
    jinjaturtle="auto",
    sops_fingerprints=None,
    no_common_roles=False,
    target="ansible",
)

Plain mode dispatches to:

target=ansible -> ansible.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...)
target=puppet  -> puppet.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...)
target=salt    -> salt.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...)

SOPS mode:

accepts an already-decrypted bundle directory or a SOPS-encrypted harvest tarball,
decrypts/extracts with safe tar extraction when needed,
renders target output into a secure temp directory,
tars the manifest directory under a manifest/ prefix,
encrypts the tarball with SOPS,
returns the encrypted output path.

The renderers do not know about SOPS.

12. The renderer-neutral `CMModule` model

File: cm.py

CMModule is the shared resource model used heavily by Puppet and Salt and partially by Ansible.

@dataclass
class CMModule:
    role_name: str
    module_name: str
    packages: Set[str]
    groups: Set[str]
    users: Dict[str, Dict[str, Any]]
    dirs: Dict[str, Dict[str, Any]]
    files: Dict[str, Dict[str, Any]]
    links: Dict[str, Dict[str, Any]]
    services: Dict[str, Dict[str, Any]]
    firewall_runtime: Dict[str, Any]
    notes: List[str]

Important methods and helpers include:

add_managed_dir(), add_managed_file(), add_managed_link(),
add_package_snapshot(),
add_service_snapshot_state(),
user_records_from_snapshot(),
add_flatpak_snapshot(), add_snap_snapshot(),
add_firewall_runtime_snapshot(),
package_service_entries(),
active_service_units_by_package(),
active_service_units_for_package_snapshot(),
remove_directory_resource_conflicts().

12.1 Common role grouping

CMModule.package_service_entries() is the shared grouping mechanism for package and service snapshots.

use_common_roles=True groups package/service snapshots into section/group roles such as Debian Section or RPM Group labels. use_common_roles=False preserves one generated role/module/state per package or service snapshot.

Default behaviour:

normal manifest, no --no-common-roles: group package/service roles
--fqdn mode: no common grouping
--no-common-roles: no common grouping

--fqdn implies no common roles because host-specific output should preserve per-host state rather than merging unrelated resources into shared roles.

12.2 Catalog conflict resolution

resolve_catalog_conflicts() runs for Puppet and Salt.

It removes duplicates across generated modules/states for:

packages,
groups,
users,
directories,
files,
symlinks,
services.

It also removes directory resources that conflict with a file or link at the same path. This matters because Puppet and Salt compile a single catalog; duplicates that Ansible might tolerate can fail hard there.

13. Ansible renderer

File: ansible.py

Entry point:

ansible.manifest_from_bundle_dir(
    bundle_dir,
    out_dir,
    fqdn=None,
    jinjaturtle="auto",
    no_common_roles=False,
)

It instantiates AnsibleManifestRenderer(...).render().

13.1 Ansible render flow

flowchart TD
  A[AnsibleManifestRenderer.render] --> B[AnsibleRole.load_state]
  B --> C[roles_from_state + inventory_packages_from_state]
  C --> D[_prepare_ansible_context]
  D --> E[_write_site_scaffold]
  E --> F[_collect_ansible_roles]
  F --> G[_render_managed_file_roles]
  F --> H[_render_users_role]
  F --> I[_render_flatpak_role]
  F --> J[_render_snap_role]
  F --> K[_render_container_images_role]
  F --> L[_render_sysctl_role]
  F --> M[_render_firewall_runtime_role]
  M --> N[_render_enroll_runtime_role if firewall runtime exists]
  F --> O[_render_service_roles]
  F --> P[_render_common_ansible_roles]
  F --> Q[_render_package_roles]
  Q --> R[_write_manifest_playbook]
  R --> S[README.md]

13.2 Output layout

Default single-site output:

<out>/
  ansible.cfg
  playbook.yml
  README.md
  requirements.yml
  roles/
    <role>/
      tasks/main.yml
      handlers/main.yml
      defaults/main.yml
      meta/main.yml
      files/...
      templates/...

--fqdn site-mode output adds inventory and host vars:

<out>/
  inventory/
    hosts.yml
    host_vars/<fqdn>/<role>/
      main.yml
      .files/...
  roles/<role>/...

In default mode, variables normally live in roles/<role>/defaults/main.yml and raw files live under roles/<role>/files/.

In --fqdn mode, host-specific values and artifacts live under inventory/host_vars/<fqdn>/<role>/, while reusable role scaffolding remains under roles/.

13.3 Role ordering

Ansible playbook roles are ordered intentionally:

package-manager config roles (apt_config, dnf_config),
common grouped roles,
standalone package roles,
service roles,
custom file roles (etc_custom, usr_local_custom, extra_paths),
Flatpak, Snap, container images, users,
cron/logrotate moved toward the end when present,
runtime roles (enroll_runtime, sysctl, firewall_runtime).

enroll_runtime is rendered only when firewall runtime is rendered.

13.4 Role tags

Generated playbooks tag roles with role_<safe_role_name>. diff --enforce --target ansible uses these tags to narrow enforcement to roles relevant to the drift report when it can.

Puppet and Salt enforcement do not currently narrow to per-role tags; they run the full generated local manifest/state tree.

13.5 Ansible and JinjaTurtle

Ansible uses jinjaturtle.jinjify_managed_files().

When JinjaTurtle is enabled and supports a harvested config file, the renderer can write:

a Jinja2 template under templates/,
variables in defaults/main.yml or inventory/host_vars/<fqdn>/<role>/main.yml.

If JinjaTurtle is unavailable in auto mode, fails, emits missing variables, or does not support the path, Ansible falls back to copying the raw harvested file.

14. Puppet renderer

File: puppet.py

Entry point:

puppet.manifest_from_bundle_dir(
    bundle_dir,
    out_dir,
    fqdn=None,
    no_common_roles=False,
    jinjaturtle="auto",
)

It instantiates PuppetManifestRenderer(...).render().

14.1 Puppet render flow

flowchart TD
  A[PuppetManifestRenderer.render] --> B[PuppetRole.load_state]
  B --> C[resolve_jinjaturtle_mode]
  C --> D[_collect_puppet_roles]
  D --> E[resolve_catalog_conflicts]
  E --> F[_sync_service_notifications]
  F --> G[write modules/<module>/manifests/init.pp]
  G --> H[write metadata.json]
  H --> I{fqdn?}
  I -->|no| J[write manifests/site.pp with node default]
  I -->|yes| K[write hiera.yaml]
  K --> L[write data/nodes/<fqdn>.yaml]
  L --> M[write Hiera-driven site.pp]
  J --> N[README.md]
  M --> N

14.2 `PuppetRole`

PuppetRole extends CMModule and converts snapshots into Puppet-friendly resources. It handles:

packages,
users and groups,
managed dirs/files/symlinks,
services,
sysctl apply execs,
Flatpak remotes/apps via guarded exec,
Snap installs via guarded exec,
Docker/Podman images by digest via guarded exec,
firewall runtime files and refresh-only restore execs,
JinjaTurtle ERB templates and class/Hiera parameter values.

_puppet_name() sanitises module names and avoids Puppet reserved words such as default, class, node, site, and init.

14.3 Output layout

Default mode:

<out>/
  manifests/site.pp
  README.md
  modules/
    <module>/
      metadata.json
      manifests/init.pp
      files/...
      templates/...

Default site.pp includes generated classes in manifest order under a node default or named node block.

14.4 Puppet `--fqdn` / Hiera mode

When --fqdn is supplied, Puppet output switches to Hiera-style node data:

<out>/
  hiera.yaml
  manifests/site.pp
  data/
    common.yaml
    nodes/<fqdn>.yaml
  modules/
    <module>/
      metadata.json
      manifests/init.pp
      files/nodes/<fqdn>/...
      templates/...

In this mode:

site.pp includes classes from Hiera key enroll::classes,
data/nodes/<fqdn>.yaml contains class list and parameter data,
module classes are data-driven via Automatic Parameter Lookup,
node-specific raw file artifacts live under modules/<module>/files/nodes/<fqdn>/...,
JinjaTurtle ERB template values are written into node Hiera data.

Re-running Enroll with another --fqdn into the same output directory is intended to add or replace that node's YAML without deleting existing node data.

14.5 Puppet and JinjaTurtle

Puppet now participates in the shared JinjaTurtle integration.

When enabled, Puppet calls jinjaturtle with ERB-specific options:

--template-engine erb
--puppet-class <module_name>

The resulting template is written under:

modules/<module>/templates/<src_rel>.erb

Static single-node mode renders class parameters with defaults and uses:

content => template('<module>/<src_rel>.erb')

Hiera mode writes template parameter values into data/nodes/<fqdn>.yaml and renders data-driven file resources.

jinjaturtle.missing_erb_template_vars() checks that ERB instance variables such as @main_key have matching context/Hiera data. If variables are missing, Enroll falls back to raw file copying rather than emitting a broken Puppet template.

15. Salt renderer

File: salt.py

Entry point:

salt.manifest_from_bundle_dir(
    bundle_dir,
    out_dir,
    fqdn=None,
    no_common_roles=False,
    jinjaturtle="auto",
)

It instantiates SaltManifestRenderer(...).render().

15.1 Salt render flow

flowchart TD
  A[SaltManifestRenderer.render] --> B[SaltRole.load_state]
  B --> C[resolve_jinjaturtle_mode]
  C --> D[_collect_salt_roles]
  D --> E[resolve_catalog_conflicts]
  E --> F[write states/roles/<role>/init.sls]
  F --> G{fqdn?}
  G -->|no| H[write states/top.sls target '*']
  G -->|yes| I[write pillar node data]
  I --> J[write states/top.sls and pillar/top.sls]
  H --> K[write config/master.d/enroll.conf]
  J --> K
  K --> L[README.md]

15.2 `SaltRole`

SaltRole extends CMModule and changes managed_owner_attr to user, because Salt file.managed uses user rather than owner.

It prepares:

packages as pkg.installed,
groups as group.present,
users as user.present,
dirs/files/symlinks as Salt file.* states,
services as service.running or service.dead,
Flatpaks/Snaps via guarded cmd.run,
Docker/Podman images via guarded cmd.run,
firewall runtime restore commands,
optional Jinja templates for managed files.

15.3 Output layout

Default mode:

<out>/
  README.md
  config/master.d/enroll.conf
  states/
    top.sls
    roles/<role>/
      init.sls
      files/...
      templates/...

--fqdn mode:

<out>/
  states/
    top.sls
    roles/<role>/init.sls
  pillar/
    top.sls
    nodes/<sanitised-fqdn>_<digest>.sls

The Salt renderer can accumulate node data in --fqdn mode and preserves existing top data where appropriate.

15.4 Salt and JinjaTurtle

Salt uses jinjaturtle.jinjify_artifact() directly. When successful, a managed file becomes a Salt file.managed with:

source: salt://roles/<role>/templates/<src_rel>.j2
template: jinja
context: {...}

Salt has one additional compatibility step: _saltify_jinjaturtle_template() rewrites Ansible-oriented to_json(...) filters emitted by JinjaTurtle into Salt-safe context variables or tojson filters.

If templating fails or is unsupported, the renderer falls back to a literal file copy under files/.

16. Shared JinjaTurtle integration

File: jinjaturtle.py

JinjaTurtle mode is resolved by:

resolve_jinjaturtle_mode("auto" | "on" | "off")

Semantics:

auto: use jinjaturtle when it exists on PATH; otherwise copy raw files.
on: require jinjaturtle; error if missing.
off: never use it.

Supported path types include structured config suffixes:

.ini .cfg .json .toml .yaml .yml .xml .repo

and systemd unit-like suffixes:

.service .socket .target .timer .path .mount .automount .slice .swap .scope .link .netdev .network

Special format forcing is used for:

main.cf -> postfix,
systemd unit files -> systemd,
sshd_config, ssh_config, and matching *.conf snippets under sshd_config.d / ssh_config.d -> ssh.

The central helper is:

jinjify_artifact(
    bundle_dir,
    artifact_role,
    src_rel,
    dest_path,
    template_root,
    jt_exe=...,
    jt_enabled=...,
    template_engine="jinja2" | "erb",
    puppet_class=...,      # Puppet only
)

Ansible uses jinjify_managed_files() because it merges variables into role defaults or host vars. Salt uses jinjify_artifact() directly because context lives with each file.managed. Puppet uses jinjify_artifact(..., template_engine="erb", puppet_class=<module>) so variables line up with Puppet class/Hiera names.

Safety checks:

missing_jinja_template_vars() rejects Jinja2 templates that reference absent variables.
missing_erb_template_vars() rejects ERB templates that reference absent Puppet/Hiera variables.

When checks fail, Enroll deletes obsolete generated templates when appropriate and falls back to raw file copying.

17. Diff, notifications, and enforcement

File: diff.py

17.1 Inputs

compare_harvests() accepts:

bundle directories,
direct state.json paths,
plain .tar.gz / .tgz bundles,
SOPS-encrypted bundles when sops_mode=True or the name ends with .sops.

Bundle resolution is handled by _bundle_from_input(), which reuses remote._safe_extract_tar() for tarball extraction.

17.2 What diff compares

compare_harvests() compares:

package add/remove/version changes,
enabled systemd unit add/remove/state/package changes,
user add/remove/field changes,
managed file add/remove/content/metadata changes.

File content changes are detected by hashing artifacts.

--exclude-path filtering applies only to file drift reporting, not package/service/user diffs.

--ignore-package-versions suppresses package version-only drift from both the report and has_changes, but package additions/removals are still reported.

Reports are formatted by:

format_report(report, fmt="text" | "markdown" | "json")

17.3 Enforcement decision

has_enforceable_drift() is intentionally conservative.

Enforceable drift includes:

packages that were removed from the current host but existed in the baseline,
baseline services that were removed or changed in meaningful non-package fields,
baseline users that were removed or changed,
baseline files that were removed or changed.

Not enforceable:

newly installed packages,
package version changes alone,
newly enabled services,
newly added users,
newly added managed files.

This keeps --enforce focused on restoring baseline state rather than deleting unknown current state or downgrading packages.

17.4 Target-selected enforcement

enforce_old_harvest() now accepts target="ansible" | "puppet" | "salt".

It performs:

resolve the old/baseline harvest,
build a best-effort enforcement plan from the diff report,
generate a temporary manifest from the old harvest using the selected target,
run the matching local apply tool,
attach enforcement metadata to the diff report.

Target commands:

ansible -> ansible-playbook -i localhost, -c local playbook.yml
puppet  -> puppet apply --modulepath ./modules [--hiera_config ./hiera.yaml] manifests/site.pp
salt    -> salt-call --local --file-root ./states [--pillar-root ./pillar] state.apply

Only Ansible uses generated per-role tags to narrow the apply scope. Puppet and Salt enforcement deliberately run the full generated local manifest/state tree for now. The JSON report keeps target-specific compatibility fields such as ansible_playbook, puppet, or salt_call.

17.5 Notifications

diff.py also supports webhooks and email notifications:

post_webhook() sends JSON/text/markdown payloads with optional extra headers.
send_email() uses SMTP when configured or local sendmail when SMTP is omitted.

CLI notification options are only sent when differences exist unless --notify-always is set.

18. Explanation and validation

18.1 `explain.py`

explain_state() reads a harvest and produces text or JSON explaining:

host metadata,
role summaries,
users,
services,
package snapshots,
runtime firewall,
sysctl,
custom files,
inventory packages,
notes and exclusion reasons.

This is intended to answer “what did Enroll collect and why?”

18.2 `validate.py`

validate_harvest() checks:

state.json exists,
it parses as JSON,
it validates against the vendored schema unless --no-schema is set,
every managed_file.src_rel points to an artifact file,
firewall runtime generated artifacts exist,
there are no unreferenced artifact files, reported as warnings.

It returns a ValidationResult with errors, warnings, ok(), to_dict(), and to_text().

The CLI supports local schema override with --schema, warning failure with --fail-on-warnings, JSON/text output, and --out.

19. Remote harvesting

File: remote.py

Remote mode is called from cli.py when --remote-host is supplied.

Public entry point:

remote_harvest(...)

It wraps _remote_harvest() and handles:

optional sudo password prompting,
optional SSH key passphrase prompting or environment variable lookup,
retrying when remote sudo requires a password,
retrying when an encrypted SSH private key needs a passphrase.

19.1 Remote harvest flow

flowchart TD
  A[remote_harvest] --> B[resolve sudo password]
  B --> C[resolve SSH key passphrase]
  C --> D[_remote_harvest]
  D --> E[build local enroll.pyz zipapp]
  E --> F[connect with Paramiko]
  F --> G[upload zipapp]
  G --> H[run remote enroll harvest]
  H --> I[tar/gzip remote bundle]
  I --> J[download tarball]
  J --> K[_safe_extract_tar locally]
  K --> L[return local state.json path]

_build_enroll_pyz() packages the local enroll Python package into a zipapp and uses enroll.cli:main as its entry point.

19.2 SSH config support

--remote-ssh-config enables Paramiko SSHConfig support for settings such as:

HostName,
Port,
User,
IdentityFile,
ConnectTimeout,
ProxyCommand,
AddressFamily,
HostKeyAlias where supported by the connection logic.

Unknown host keys are rejected by default through Paramiko's reject policy. Users should have valid host keys in known hosts.

19.3 Safe tar extraction

_safe_extract_tar() validates tar members before extraction and rejects:

absolute paths,
.. traversal,
symlinks,
hardlinks,
device nodes,
anything resolving outside the destination.

This helper is reused by remote harvest, manifest SOPS extraction, and diff bundle resolution.

20. SOPS support

File: sopsutil.py

SOPS support is binary tarball encryption, not field-level YAML encryption.

20.1 Harvest SOPS mode

enroll harvest --sops <fingerprint...>:

harvests into a secure temp directory,
tars the bundle,
encrypts it with SOPS binary mode,
writes harvest.tar.gz.sops or the requested output file.

20.2 Manifest SOPS mode

enroll manifest --sops <fingerprint...>:

decrypts/extracts the harvest if needed,
generates the chosen target manifest in a temp directory,
tars the generated output,
encrypts it as a single SOPS file.

20.3 Helpers

sopsutil.py provides:

find_sops_cmd(),
require_sops_cmd(),
encrypt_file_binary(),
decrypt_file_binary_to().

Encryption/decryption helpers write via temp files and default to mode 0600.

21. Configuration file support

cli.py supports optional INI config files.

Discovery order:

--no-config disables config loading,
--config PATH or -c PATH,
$ENROLL_CONFIG,
./enroll.ini,
./.enroll.ini,
$XDG_CONFIG_HOME/enroll/enroll.ini,
~/.config/enroll/enroll.ini.

Config sections are translated into argv tokens by _inject_config_argv():

[enroll] for global options,
[harvest], [manifest], [single-shot], [diff], [explain], [validate] for subcommand options,
[single_shot] is accepted as an alias for [single-shot].

CLI flags win because config-derived tokens are inserted before user-supplied argv tokens.

The translation is argparse-driven, so new flags often gain config-file support automatically as long as they are represented by normal argparse actions.

22. CLI flags that affect multiple layers

22.1 `--target`

--target ansible|puppet|salt exists for:

enroll manifest,
enroll single-shot,
enroll diff --enforce.

For manifest and single-shot, it chooses the output renderer. For diff --enforce, it chooses both the temporary manifest target and the local apply tool.

22.2 `--fqdn`

--fqdn changes output semantics, not just filenames:

Ansible: uses inventory/host_vars and host-specific artifacts.
Puppet: uses Hiera node data and Hiera-driven classes.
Salt: uses pillar node data and minion-targeted top files.

--fqdn implies no common role grouping.

22.3 `--no-common-roles`

Disables the default grouping of package/service snapshots by Debian Section or RPM Group. This preserves one generated role/module/state per package or unit snapshot.

22.4 `--jinjaturtle` / `--no-jinjaturtle`

The CLI maps these to renderer mode strings:

no flag           -> auto
--jinjaturtle     -> on
--no-jinjaturtle  -> off

All three manifest targets receive this mode. Puppet uses ERB when JinjaTurtle is enabled; Ansible and Salt use Jinja2.

23. Tests and how to navigate them

Run tests with:

poetry install
poetry run pytest

or the repository helper when appropriate:

./tests.sh

Important test files:

Test file	What it covers
`test_cli.py`	argparse dispatch, remote flags, manifest target forwarding, single-shot flow.
`test_cli_config_and_sops.py`, `test_cli_helpers.py`	config-file injection and SOPS output helpers.
`test_harvest.py`, `test_harvest_helpers.py`	harvest orchestration, sysctl/firewall helpers, role naming.
`test_harvest_collectors.py`	runtime and container image collectors.
`test_harvest_cron_logrotate.py`	cron/logrotate unification.
`test_harvest_symlinks.py`	nginx/apache enabled symlink capture.
`test_accounts.py`	users, Flatpak, Snap parsing/discovery.
`test_ignore.py`, `test_ignore_dir.py`	secret/noise policy.
`test_pathfilter.py`	include/exclude matching and expansion.
`test_platform.py`, `test_platform_backends.py`	platform detection and backend behaviour.
`test_debian.py`, `test_rpm.py`, `test_rpm_run.py`	package manager helpers.
`test_manifest.py`, `test_manifest_ansible.py`	Ansible rendering and role behaviour.
`test_manifest_puppet.py`	Puppet rendering, Hiera mode, reserved names, firewall/container/Flatpak/Snap/JinjaTurtle support.
`test_manifest_salt.py`	Salt rendering, pillar mode, JinjaTurtle, firewall/container/Flatpak/Snap support.
`test_manifest_symlinks.py`	symlink manifest output.
`test_jinjaturtle.py`	shared template generation and fallback safety.
`test_diff_bundle.py`, `test_diff_ignore_versions_exclude_enforce.py`, `test_diff_notifications.py`	diff, bundle resolution, target-selected enforcement, notifications.
`test_remote.py`	remote harvest, SSH/sudo prompts, safe tar extraction.
`test_explain.py`	harvest explanation output.
`test_validate.py`	schema/artifact validation.
`test_cm.py`	`CMModule` conflict resolution and service-package helpers.
`test_fsutil.py`, `test_fsutil_extra.py`	file hashing and stat metadata helpers.

When changing behaviour, extend the closest specific tests rather than relying only on broad integration tests.

24. Common maintenance tasks

24.1 Add a new thing to harvest

Add or extend a dataclass in harvest_types.py if existing snapshots cannot represent it.
Add a collector under harvest_collectors/ if it is a distinct feature.
Add the collector to the sequence in harvest.harvest().
Add the snapshot to the state = {...} object in harvest.harvest().
Update schema/state.schema.json.
Update renderers that should emit the new resource.
Update explain.py and validate.py if users need visibility or artifact checks.
Add tests for harvest and each renderer.

24.2 Add a new renderer target

Create <target>.py with manifest_from_bundle_dir().
Load state via CMModule.load_state() or state.load_state().
Consume roles_from_state() and inventory_packages_from_state().
Convert snapshots into renderer-specific role/module/state objects.
Reuse CMModule.package_service_entries() for package/service grouping.
Run conflict resolution if the target compiles a global catalog.
Write target output and README.
Add the target to manifest.manifest() validation and dispatch.
Add CLI choices in _add_common_manifest_args() and diff enforcement if applicable.
Add tests.

24.3 Add a new CLI flag

For harvest-affecting flags:

add the flag to cli.py for harvest and possibly single-shot,
forward it to harvest.harvest() or remote.remote_harvest(),
forward it through remote command construction if remote mode needs it,
check whether config-file injection handles it,
add tests in test_cli.py and feature-specific tests.

For manifest-affecting flags:

add it to _add_common_manifest_args() if all manifest-like commands need it,
forward it through manifest.manifest(),
forward it to target renderers,
add tests for forwarding and output.

For diff enforcement flags:

add argparse support under the diff subparser,
pass values to compare_harvests() or enforce_old_harvest(),
update report formatting if new fields appear,
add tests in test_diff_ignore_versions_exclude_enforce.py or test_diff_notifications.py.

24.4 Change file safety rules

Modify ignore.py and add tests in test_ignore.py / test_ignore_dir.py.

Be careful:

relaxing safety affects secret exposure risk,
tightening safety can make expected config disappear,
binary allowance matters for APT/RPM keyrings,
--dangerous must remain explicit for risky harvesting.

24.5 Change service/package attribution

Most logic is in:

harvest_collectors/services.py,
package_hints.py,
system_paths.py,
package backend modified_paths() implementations.

Preserve these invariants:

cron/logrotate should stay unified when installed,
shared directories should not be attributed too broadly,
package-manager config belongs in apt_config/dnf_config,
captured_global should prevent duplicates,
stopped services should not receive broad restart notifications.

24.6 Change manifest role grouping

Common grouping uses:

CMModule.package_service_entries(),
package_section_label(),
section_label_for_packages().

Remember:

default non---fqdn output groups package/service roles unless --no-common-roles is set,
--fqdn implies per-role output,
Ansible, Puppet, and Salt grouping should stay conceptually aligned,
Puppet/Salt need resolve_catalog_conflicts() after grouping.

24.7 Change JinjaTurtle support

Shared path support and safety checks belong in jinjaturtle.py.

Renderer-specific behaviour belongs in the renderer:

Ansible: variables in defaults or host vars, templates under role templates/.
Puppet: ERB templates, class params or Hiera values.
Salt: file.managed context and Salt-safe Jinja rewrites.

Fallback-to-raw-copy is part of the product contract unless JinjaTurtle was explicitly required and missing.

24.8 Change diff enforcement

diff --enforce now has a target dimension.

When changing it, keep these distinctions clear:

has_enforceable_drift() decides whether enforcement should run.
_enforcement_plan() finds relevant baseline roles.
Ansible uses role tags from the plan.
Puppet and Salt currently run a full manifest/state apply.
_enforcement_command() is the source of truth for local apply commands.
cli.py attaches enforcement metadata to the report and formats it.

Do not make enforcement delete newly added packages/users/files/services unless the safety model is explicitly redesigned.

25. Important maintenance hazards

25.1 Renderer output is downstream of harvest state

If a renderer needs information, first ask whether that information belongs in state.json. Avoid papering over missing harvest facts inside a renderer.

25.2 `--fqdn` mode is not cosmetic

--fqdn changes where variables and artifacts live and how target inclusion works.

A change that works in default mode can still break:

Ansible host vars,
Puppet Hiera node data,
Salt pillar node data.

25.3 Puppet and Salt are stricter about duplicates

Ansible often tolerates repeated packages or tasks. Puppet and Salt compile catalogs where duplicate resources can fail. Keep resolve_catalog_conflicts() in mind whenever adding resources.

25.4 Secret avoidance is part of the product contract

Default harvest should avoid likely secrets. --dangerous exists because useful files may contain secrets. Do not silently make risky harvesting the default.

25.5 Runtime state should not override persistent config

Firewall runtime capture is skipped when persistent firewall config exists. Preserve this principle for future runtime snapshots.

25.6 JinjaTurtle is best-effort except when explicitly required

auto mode should not make manifest generation fail merely because templating failed. on should require the executable; unsupported or unsafe individual files should still fall back to raw copy unless code explicitly changes that contract.

25.7 Role names must be sanitised

Raw package/service names can be invalid or reserved in Ansible roles, Puppet classes, or Salt SLS names. Use role-name helpers and singleton collision protection.

25.8 Tests encode edge cases

Many behaviours exist because of previously found edge cases:

non-root/no-sudo harvests,
Puppet reserved words,
Salt Docker module availability limitations,
symlink capture,
JinjaTurtle missing variables,
Salt JSON filter compatibility,
file caps,
SOPS secure temp files,
tar path traversal,
target-selected diff enforcement.

Before simplifying logic, search the tests.

26. Troubleshooting guide

26.1 Generated manifest references a missing artifact

Likely causes:

managed_files[*].src_rel was added without copying into artifacts/,
a renderer used the generated role/module name instead of the artifact role,
a role was renamed after harvest but before artifact lookup,
--fqdn file prefixes are wrong.

Start with:

enroll validate /path/to/harvest

Then inspect:

state.json roles.*.managed_files[*]
artifacts/<role>/<src_rel>

26.2 Puppet fails with duplicate resources

Check:

_collect_puppet_roles(),
resolve_catalog_conflicts(),
role_order_key(),
whether a new resource type needs conflict resolution,
whether a directory resource conflicts with a file/link of the same path.

26.3 Salt fails with duplicate IDs or missing modules

Check:

_state_id() naming,
_collect_salt_roles() grouping,
resolve_catalog_conflicts(),
guarded cmd.run fallbacks for Docker/Podman/Snap/Flatpak.

Salt uses guarded shell commands for some resources because native states/modules are not consistently available across Salt installations.

26.4 Ansible check mode reports unexpected changes

Check:

role ordering,
grouped mode versus --fqdn / --no-common-roles,
handler notifications,
whether runtime roles were emitted without runtime artifacts,
harvested directory/file mode normalisation.

Grouped and per-role output can legitimately produce different numbers of reported changes.

26.5 A file was not harvested

Check, in order:

Was it excluded by --exclude-path?
Was it denied by IgnorePolicy?
Was it too large?
Did it look binary?
Did it contain sensitive-looking content?
Was it already captured by another role via captured_global?
Is it outside known scanned locations?
Would --include-path collect it?
Does it require --dangerous?

enroll explain can show notes and exclusion reasons.

26.6 `diff --enforce` fails

Check:

whether the selected --target tool is on PATH,
ansible-playbook for Ansible,
puppet for Puppet,
salt-call for Salt,
whether the generated temp manifest has the expected target entrypoint,
whether the report contains enforceable drift,
whether package drift is only version changes or additions, which enforcement skips.

26.7 Remote harvest fails with sudo or SSH key prompts

Relevant flags:

--ask-become-pass,
--ask-key-passphrase,
--ssh-key-passphrase-env,
--no-sudo,
--remote-ssh-config.

Interactive sessions can prompt and retry. Non-interactive sessions should pass explicit flags or environment variables.

27. Practical code-reading map

Feature/question	Start with	Then read
CLI option behaviour	`cli.py`	called module for `args.cmd`
Local harvest ordering	`harvest.py:harvest()`	`harvest_collectors/`
Why a file was skipped	`capture.py`, `ignore.py`, `pathfilter.py`	`explain.py`
File metadata/hash helpers	`fsutil.py`	`debian.py`, `capture.py`
Service/package attribution	`harvest_collectors/services.py`	`package_hints.py`, `platform.py`
APT/DNF config capture	`harvest_collectors/package_manager.py`	`system_paths.py`
Users and SSH keys	`harvest_collectors/users.py`	`accounts.py`
Flatpak/Snap parsing	`accounts.py`	renderer Flatpak/Snap helpers
Docker/Podman images	`harvest_collectors/container_images.py`	renderer container image helpers
Runtime firewall	`harvest_collectors/runtime.py`, `harvest.py`	renderer firewall helpers
Sysctl	`harvest.py` sysctl helpers	renderer sysctl role functions
Ansible output	`ansible.py:AnsibleManifestRenderer.render()`	`_render_*` helpers
Puppet output	`puppet.py:PuppetManifestRenderer.render()`	`_collect_puppet_roles()`
Salt output	`salt.py:SaltManifestRenderer.render()`	`_collect_salt_roles()`
Grouping/common roles	`cm.py`	renderer collection functions
JinjaTurtle	`jinjaturtle.py`	renderer managed-content code
Diff/enforce	`diff.py`	`manifest.py`, target renderer
Validation	`validate.py`	schema file and `state.json`
Remote mode	`remote.py`	`cli.py` remote branches
SOPS	`sopsutil.py`	`cli.py`, `manifest.py`, `diff.py`

28. Glossary

Harvest bundle A directory or encrypted tarball containing state.json and artifacts/.

Snapshot A structured object under roles in state.json, such as a ServiceSnapshot or PackageSnapshot.

Managed file A file Enroll intends generated CM code to recreate. It has a destination path and a matching artifact file.

Managed link A symlink Enroll intends generated CM code to recreate.

Managed dir A directory Enroll intends generated CM code to ensure exists with recorded metadata.

Role The Enroll logical group for related resources. In Ansible it usually maps to an Ansible role. In Puppet it maps to a module/class. In Salt it maps to an SLS role.

Artifact role The role directory under artifacts/ that contains a harvested file. This can differ from the generated renderer role when grouping is enabled.

Common/grouped role A generated role/module/state that merges multiple package/service snapshots by Debian Section or RPM Group.

Site mode / --fqdn mode Host-specific output mode. Ansible uses host vars, Puppet uses Hiera node data, and Salt uses pillar node data.

Dangerous mode Explicit opt-in mode that relaxes safety checks and enables risky capture such as user shell dotfiles.

JinjaTurtle Optional external tool used to convert recognised config files into Jinja2 or ERB templates plus variable defaults/context.

Enforcement target The config manager chosen for diff --enforce with --target ansible|puppet|salt.

29. Final maintenance model

Most changes should preserve this pipeline:

Collect facts and files safely
  -> represent them in target-neutral state.json
  -> keep artifact references consistent
  -> let each renderer translate the same state into its own idioms
  -> validate the bundle and test each target

Before changing code, ask:

Is this a harvest concern or renderer concern?
Does state.json or the schema need to change?
Does this affect --fqdn mode?
Does this introduce duplicate ownership of a path/resource?
Does this weaken default secret avoidance?
Do Puppet and Salt need conflict handling?
Does JinjaTurtle fallback still behave safely?
Does diff --enforce --target ... still do the conservative thing?
Do existing tests explain why the current behaviour exists?

Keeping those boundaries clear is the main way to maintain Enroll without creating subtle cross-target regressions.

63 KiB Raw Blame History

Enroll Development Guide

1. What Enroll does

2. Repository layout

3. Main runtime flows

3.1 CLI entry flow

3.2 Subcommand call graph

4. Harvest bundles

5. state.json shape and snapshot dataclasses

6. Harvest orchestration

6.1 High-level harvest order

6.2 HarvestContext

6.3 Global de-duplication

7. File capture and safety policy

7.1 capture_file()

7.2 capture_link()

7.3 User shell dotfiles

7.4 IgnorePolicy

7.5 PathFilter

8. Platform and package backends

8.1 Debian backend

8.2 RPM backend

8.3 Adding a new package backend

9. Harvest collectors in detail

9.1 RuntimeStateCollector

Sysctl capture

Firewall runtime capture

9.2 CronLogrotateCollector

9.3 ServicePackageCollector

9.4 UsersCollector

9.5 ContainerImagesCollector

9.6 PackageManagerConfigCollector

9.7 etc_custom scan

9.8 UsrLocalCustomCollector

9.9 ExtraPathsCollector

10. Path scanners and package hints

11. Manifest orchestration

12. The renderer-neutral CMModule model

12.1 Common role grouping

12.2 Catalog conflict resolution

13. Ansible renderer

13.1 Ansible render flow

13.2 Output layout

13.3 Role ordering

13.4 Role tags

13.5 Ansible and JinjaTurtle

14. Puppet renderer

14.1 Puppet render flow

14.2 PuppetRole

14.3 Output layout

14.4 Puppet --fqdn / Hiera mode

14.5 Puppet and JinjaTurtle

15. Salt renderer

15.1 Salt render flow

15.2 SaltRole

15.3 Output layout

15.4 Salt and JinjaTurtle

16. Shared JinjaTurtle integration

17. Diff, notifications, and enforcement

17.1 Inputs

17.2 What diff compares

17.3 Enforcement decision

17.4 Target-selected enforcement

17.5 Notifications

18. Explanation and validation

18.1 explain.py

18.2 validate.py

19. Remote harvesting

19.1 Remote harvest flow

19.2 SSH config support

19.3 Safe tar extraction

20. SOPS support

20.1 Harvest SOPS mode

20.2 Manifest SOPS mode

20.3 Helpers

21. Configuration file support

22. CLI flags that affect multiple layers

22.1 --target

22.2 --fqdn

22.3 --no-common-roles

63 KiB

Raw Blame History

5. `state.json` shape and snapshot dataclasses

6.2 `HarvestContext`

7.1 `capture_file()`

7.2 `capture_link()`

7.4 `IgnorePolicy`

7.5 `PathFilter`

9.1 `RuntimeStateCollector`

9.2 `CronLogrotateCollector`

9.3 `ServicePackageCollector`

9.4 `UsersCollector`

9.5 `ContainerImagesCollector`

9.6 `PackageManagerConfigCollector`

9.7 `etc_custom` scan

9.8 `UsrLocalCustomCollector`

9.9 `ExtraPathsCollector`

12. The renderer-neutral `CMModule` model

14.2 `PuppetRole`

14.4 Puppet `--fqdn` / Hiera mode

15.2 `SaltRole`

18.1 `explain.py`

18.2 `validate.py`

22.1 `--target`

22.2 `--fqdn`

22.3 `--no-common-roles`

22.4 `--jinjaturtle` / `--no-jinjaturtle`

25.2 `--fqdn` mode is not cosmetic

26.6 `diff --enforce` fails