63 KiB
Enroll Development Guide
Interested in the internals of Enroll?
This guide describes the current enroll codebase for maintainers. It focuses on how the project is organised, what calls what, how harvest state flows into generated configuration-management output, and which invariants matter when changing the code.
1. What Enroll does
enroll is a Linux host inspection and configuration-management generation tool.
Its core pipeline is:
Running Linux host
|
| enroll harvest
v
Harvest bundle
state.json
artifacts/<role>/<path-relative-to-root>
|
| enroll manifest --target ansible|puppet|salt
v
Generated configuration-management output
Ansible roles/playbook
Puppet modules/site.pp/Hiera data
Salt states/pillar data
The harvest bundle is deliberately target-neutral. Ansible, Puppet, and Salt renderers all consume the same state.json shape and the same harvested artifacts. Renderer code should translate harvest state into the target's idioms; it should not invent source facts that belong in the harvest.
enroll diff is also built around harvest bundles. It compares two harvests and, when --enforce is requested, can generate a temporary manifest from the old harvest and apply it locally with the selected target:
enroll diff --old ./baseline --new ./current --enforce --target ansible
enroll diff --old ./baseline --new ./current --enforce --target puppet
enroll diff --old ./baseline --new ./current --enforce --target salt
For enforcement, the user is responsible for having the chosen local apply tool on PATH: ansible-playbook, puppet, or salt-call.
2. Repository layout
The project is a single Python package under enroll/ with tests under tests/.
enroll/
__main__.py python -m enroll entry point
cli.py argparse CLI and subcommand dispatcher
version.py package version lookup
harvest.py top-level local harvest orchestration and runtime helpers
harvest_types.py dataclasses persisted into state.json
harvest_collectors/ feature-specific collectors used by harvest.py
context.py HarvestContext and HarvestCollector base
runtime.py root-only runtime state collector wrapper
cron_logrotate.py cron/logrotate unification collector
services.py systemd service + manual package collector
users.py users, SSH public files, Flatpak, Snap collector
package_manager.py apt/dnf/yum config collectors
container_images.py Docker/Podman image collector
paths.py /usr/local and --include-path collectors
manifest.py target router and SOPS manifest wrapper
ansible.py Ansible renderer
puppet.py Puppet renderer
salt.py Salt renderer
cm.py renderer-neutral CMModule model and grouping helpers
role_names.py reserved singleton role-name protection
accounts.py users, SSH public files, Flatpak and Snap discovery
platform.py OS/package-backend abstraction
debian.py dpkg/apt helpers
rpm.py rpm/dnf/yum helpers
systemd.py systemctl wrappers and parsers
system_paths.py known config paths and filesystem scanners
package_hints.py service/package name and config attribution helpers
capture.py safe file/symlink capture into artifacts/
fsutil.py file md5 + owner/group/mode helpers
ignore.py secret/noise avoidance policy
pathfilter.py --include-path / --exclude-path matching and expansion
state.py state.json load/write helpers
yamlutil.py YAML helpers used by renderers/JinjaTurtle
jinjaturtle.py optional config-file templating integration
diff.py harvest comparison, notifications, and target-selected enforcement
explain.py human/JSON explanation of harvest contents
validate.py schema and artifact consistency validation
remote.py Paramiko remote harvest implementation
cache.py secure local cache directories for harvests
sopsutil.py SOPS binary encryption/decryption helpers
schema/state.schema.json JSON Schema for harvest state
tests/
test_*.py unit tests grouped mostly by module/feature
The installed command is configured in pyproject.toml:
[tool.poetry.scripts]
enroll = "enroll.cli:main"
python -m enroll calls the same CLI through enroll/__main__.py.
3. Main runtime flows
3.1 CLI entry flow
All user-facing commands enter through enroll.cli.main().
enroll command
-> enroll.cli.main()
-> builds argparse parser and subparsers
-> discovers optional INI config file
-> injects config-derived argv defaults before user argv
-> parses final argv
-> dispatches by args.cmd
The supported subcommands are:
harvest collect a harvest bundle from a local or remote host
manifest generate Ansible/Puppet/Salt output from a harvest bundle
single-shot run harvest and manifest in one command
diff compare two harvest bundles and optionally enforce old state
explain produce a human/JSON explanation of a harvest
validate validate state.json and referenced artifacts
cli.py should stay orchestration-heavy, not domain-heavy. It should parse flags, handle config/SOPS/remote branching, and then call the relevant module. It should not contain the meaning of a service, package, user, file, renderer resource, or harvest snapshot.
3.2 Subcommand call graph
flowchart TD
A[enroll.cli.main] --> B{args.cmd}
B -->|harvest local| C[harvest.harvest]
B -->|harvest remote| D[remote.remote_harvest]
B -->|manifest| E[manifest.manifest]
B -->|single-shot local| C
B -->|single-shot remote| D
C --> E
D --> E
B -->|diff| F[diff.compare_harvests]
F --> G[diff.format_report]
F --> H{--enforce?}
H -->|yes| I[diff.enforce_old_harvest]
I --> J[manifest.manifest target=ansible|puppet|salt]
J --> K[ansible-playbook or puppet apply or salt-call]
B -->|explain| L[explain.explain_state]
B -->|validate| M[validate.validate_harvest]
Important dependency direction:
cli.py
depends on harvest.py, manifest.py, diff.py, explain.py, validate.py, remote.py
harvest.py
depends on harvest_collectors, platform backends, capture policy, system scanners
manifest.py
depends on ansible.py, puppet.py, salt.py
ansible.py / puppet.py / salt.py
depend on state.py, cm.py, harvested artifacts, and target-specific helpers
4. Harvest bundles
A plaintext harvest bundle is a directory:
<bundle>/
state.json
artifacts/
<role_name>/
etc/...
usr/local/...
sysctl/...
firewall/...
state.json is written by enroll.state.write_state() and loaded by enroll.state.load_state().
The renderer relies on this invariant:
state.json roles.*.managed_files[*].src_rel
must correspond to
artifacts/<artifact_role>/<src_rel>
For example, a captured /etc/nginx/nginx.conf in role nginx normally becomes:
{
"path": "/etc/nginx/nginx.conf",
"src_rel": "etc/nginx/nginx.conf",
"owner": "root",
"group": "root",
"mode": "0644",
"reason": "modified_conffile"
}
and the artifact is copied to:
artifacts/nginx/etc/nginx/nginx.conf
Renderer role/module names can differ from artifact roles, especially when common grouping is enabled. Copy helpers must therefore pass the original artifact role, not blindly use the generated renderer module name.
5. state.json shape and snapshot dataclasses
The top-level state assembled by harvest.harvest() is:
{
"enroll": {
"version": "...",
"harvest_time": 123456789
},
"host": {
"hostname": "...",
"os": "debian|redhat|unknown",
"pkg_backend": "dpkg|rpm|unknown",
"os_release": {}
},
"inventory": {
"packages": {}
},
"roles": {
"users": {},
"flatpak": {},
"snap": {},
"container_images": {},
"services": [],
"packages": [],
"apt_config": {},
"dnf_config": {},
"firewall_runtime": {},
"sysctl": {},
"etc_custom": {},
"usr_local_custom": {},
"extra_paths": {}
}
}
The persisted in-memory shapes live in enroll/harvest_types.py.
| Dataclass | Purpose |
|---|---|
ManagedFile |
A file to recreate, with destination path, artifact path, owner, group, mode, and reason. |
ManagedLink |
A symlink to recreate, such as sites-enabled entries. |
ManagedDir |
A directory to ensure exists, with owner/group/mode. |
ExcludedFile |
A path that was considered but skipped, with a reason. |
ServiceSnapshot |
One enabled systemd service and its packages/config/state. |
PackageSnapshot |
One manual package and related config. has_config=False is used when the package should still be installed but no config was found. |
UsersSnapshot |
Human users, groups, managed SSH/dotfiles, and per-user Flatpak data. |
FlatpakSnapshot |
System Flatpaks and system Flatpak remotes. |
SnapSnapshot |
System Snap installs. |
ContainerImagesSnapshot |
Docker/Podman image metadata. |
AptConfigSnapshot / DnfConfigSnapshot |
Package-manager configuration. |
EtcCustomSnapshot |
Unowned/custom /etc config not attributed elsewhere. |
UsrLocalCustomSnapshot |
Selected /usr/local/etc files and executable /usr/local/bin files. |
ExtraPathsSnapshot |
User-requested --include-path files/directories. |
FirewallRuntimeSnapshot |
Generated artifacts from live ipset/iptables state. |
SysctlSnapshot |
Generated /etc/sysctl.d/99-enroll.conf from live writable sysctls. |
The JSON Schema in enroll/schema/state.schema.json is the validation contract for persisted harvests.
6. Harvest orchestration
The local harvest entry point is:
enroll.harvest.harvest(
bundle_dir,
policy=None,
dangerous=False,
include_paths=None,
exclude_paths=None,
)
It returns the path to the written state.json.
6.1 High-level harvest order
The order matters because harvest maintains a global set of captured destination paths. Once a path is captured into one role, later collectors normally skip it.
flowchart TD
A[harvest.harvest] --> B[Build IgnorePolicy and PathFilter]
B --> C[detect_platform + get_backend]
C --> D[backend.build_etc_index]
D --> E[RuntimeStateCollector]
E --> F[CronLogrotateCollector]
F --> G[ServicePackageCollector]
G --> H[UsersCollector]
H --> I[ContainerImagesCollector]
I --> J[PackageManagerConfigCollector]
J --> K[etc_custom scan inside harvest.py]
K --> L[UsrLocalCustomCollector]
L --> M[ExtraPathsCollector]
M --> N[Build inventory.packages]
N --> O[Add parent ManagedDir entries]
O --> P[state.write_state]
6.2 HarvestContext
HarvestContext lives in harvest_collectors/context.py. It is passed to collectors instead of passing many individual dependencies.
@dataclass
class HarvestContext:
bundle_dir: str
policy: IgnorePolicy
path_filter: PathFilter
platform: Dict[str, Any]
backend: Any
installed_pkgs: Dict[str, Any]
installed_names: Set[str]
owned_etc: Set[str]
etc_owner_map: Dict[str, str]
topdir_to_pkgs: Dict[str, Set[str]]
pkg_to_etc_paths: Dict[str, List[str]]
captured_global: Set[str]
New collectors should generally accept a HarvestContext and return dataclass snapshots from harvest_types.py.
6.3 Global de-duplication
The harvester tries to avoid two generated roles owning the same destination path. This avoids duplicate config-manager resources and confusing diffs.
captured_global is passed into capture.capture_file() and capture.capture_link(). If a destination path has already been seen, later collection attempts return without capturing it again.
This is one of the most important invariants in the project:
A destination path should normally appear in only one generated role.
Puppet and Salt also run cm.resolve_catalog_conflicts() after renderer role collection because they compile a single global catalog and duplicate resources are hard failures.
7. File capture and safety policy
7.1 capture_file()
capture.capture_file() decides whether to copy a file into artifacts/ and record it in a snapshot.
capture_file(abs_path, role_name, reason, policy, path_filter, ...)
-> skip if already seen globally or in this role
-> skip if --exclude-path matches
-> ask IgnorePolicy.deny_reason(abs_path)
-> stat owner/group/mode with fsutil.stat_triplet()
-> copy to artifacts/<role_name>/<abs_path without leading slash>
-> append ManagedFile
-> mark seen in role/global
fsutil.stat_triplet() returns owner, group, and a zero-padded octal mode string. It falls back to numeric uid/gid strings if user/group names cannot be resolved.
7.2 capture_link()
capture.capture_link() records symlinks as ManagedLink entries rather than copying their targets. It is used for meaningful enablement symlinks, especially in nginx/apache-style trees such as:
/etc/nginx/sites-enabled/*
/etc/nginx/modules-enabled/*
/etc/apache2/conf-enabled/*
/etc/apache2/mods-enabled/*
/etc/apache2/sites-enabled/*
7.3 User shell dotfiles
capture.capture_user_shell_dotfiles() is called by UsersCollector, but only enabled when the harvest policy is dangerous.
In dangerous mode:
.bashrc,.profile, and.bash_logoutare captured only if they differ from/etc/skelbaselines..bash_aliasesis captured if present because there may be no skel baseline.
Outside dangerous mode, Enroll records a note explaining that shell dotfiles were not auto-harvested. Users can still include specific files via --include-path, but the normal IgnorePolicy still applies unless --dangerous is also used.
7.4 IgnorePolicy
ignore.IgnorePolicy is the default secret/noise avoidance layer.
By default it skips likely sensitive or low-value files such as:
/etc/shadow,/etc/gshadow, and backup variants,- SSH host private keys,
- private SSL/Let's Encrypt material,
- log files and editor backups,
- files larger than
max_file_bytes(256_000by default), - binary-like files except known keyring formats,
- sampled non-comment content that looks sensitive, such as private keys,
password=,token,secret, orapi_key.
--dangerous sets policy.dangerous = True, disabling deny-globs and content sniffing. This is intentional and should remain explicit.
The policy has separate methods for different filesystem types:
deny_reason(path)for regular files,deny_reason_dir(path)for directories,deny_reason_link(path)for symlinks.
7.5 PathFilter
pathfilter.PathFilter implements user-supplied path controls:
--include-pathadds extra files/directories to theextra_pathsrole.--exclude-pathremoves matching paths from all harvesting.- Excludes always win over includes.
Pattern styles:
/plain/path exact path or directory-prefix match
glob:/path/**/*.x forced glob
/path/**/*.x inferred glob because it contains glob characters
re:^/path/...$ regex
regex:^/path/...$ regex
expand_includes() is conservative: it ignores symlinks, respects excludes, caps file counts, and returns notes for unmatched patterns or caps.
8. Platform and package backends
platform.py abstracts distribution-specific package behaviour.
platform.detect_platform()
-> reads /etc/os-release
-> returns PlatformInfo(os_family, pkg_backend, os_release)
platform.get_backend(info)
-> DpkgBackend for Debian-like systems
-> RpmBackend for RedHat/Fedora-like systems
The backend interface is PackageBackend:
owner_of_path(path)
list_manual_packages()
installed_packages()
build_etc_index()
specific_paths_for_hints()
is_pkg_config_path(path)
modified_paths(pkg, paths)
8.1 Debian backend
DpkgBackend delegates to debian.py.
It uses dpkg/apt data to provide package ownership, manual package lists, installed package inventory, /etc indexes, conffile hashes, and packaged-file md5 baselines.
DpkgBackend.modified_paths() identifies:
modified_conffilewhen a dpkg conffile hash differs,modified_packaged_filewhen a packaged file md5 differs.
It deliberately leaves /etc/apt-style package-manager configuration for the apt_config role.
8.2 RPM backend
RpmBackend delegates to rpm.py.
It provides package ownership, manual package lists, installed package inventory, /etc indexes, RPM config file lists, and rpm -V style modified-file detection.
RPM-family package-manager config paths such as /etc/dnf, /etc/yum, /etc/yum.conf, /etc/yum.repos.d, and /etc/pki/rpm-gpg are collected into dnf_config, not arbitrary package roles.
8.3 Adding a new package backend
To support another package system:
- implement a
PackageBackendsubclass, - route it from
platform.get_backend(), - provide ownership lookup, manual package listing, installed package inventory,
/etcindexing, modified config detection, and package-manager config exclusion, - add backend tests comparable to
test_debian.py,test_rpm.py, andtest_platform.py.
9. Harvest collectors in detail
Collectors live under enroll/harvest_collectors/.
9.1 RuntimeStateCollector
File: harvest_collectors/runtime.py
This wrapper collects root-only live runtime state:
- writable sysctl state,
- live ipset state,
- live IPv4 iptables state,
- live IPv6 iptables state.
The actual helper implementations currently live in harvest.py:
_collect_sysctl_snapshot(),_collect_firewall_runtime_snapshot(),_parse_sysctl_a_output(),_iptables_save_has_state(),_ipset_save_has_state().
If the process is not root, runtime capture returns empty snapshots with explanatory notes.
Sysctl capture
Sysctl capture runs sysctl -a, filters to writable/persistable single-line keys, and writes a generated artifact:
artifacts/sysctl/sysctl/99-enroll.conf
The destination managed by renderers is:
/etc/sysctl.d/99-enroll.conf
The filter skips volatile/action/identity keys and inactive mutually-exclusive zero values. This avoids generating config that fails or is noisy on replay.
Firewall runtime capture
Runtime firewall capture is a fallback. Enroll first checks for persistent firewall config such as:
/etc/iptables/rules.v4
/etc/iptables/rules.v6
/etc/sysconfig/iptables
/etc/sysconfig/ip6tables
/etc/ipset.conf
/etc/ipset/*
If persistent files exist for a family, live runtime capture for that family is skipped. If no persistent file exists and live state is meaningful, Enroll writes generated artifacts such as:
artifacts/firewall_runtime/firewall/ipset.save
artifacts/firewall_runtime/firewall/iptables.v4
artifacts/firewall_runtime/firewall/iptables.v6
Renderers should only create a firewall runtime role when at least one runtime artifact exists. When firewall runtime is rendered, Ansible/Puppet/Salt also create an enroll_runtime role/module/state to own /etc/enroll before /etc/enroll/firewall.
9.2 CronLogrotateCollector
File: harvest_collectors/cron_logrotate.py
This collector runs before service/package collection to prevent cron and logrotate snippets from being scattered across unrelated roles.
It detects cron packages such as cron, cronie, cronie-anacron, vixie-cron, and fcron, and detects logrotate separately.
It captures cron-related paths such as:
/etc/crontab
/etc/cron.d/*
/etc/cron.hourly/*
/etc/cron.daily/*
/var/spool/cron/*
/var/spool/crontabs/*
/var/spool/anacron/*
It captures logrotate paths such as:
/etc/logrotate.conf
/etc/logrotate.d/*
It returns PackageSnapshot objects for cron and logrotate when those packages exist.
9.3 ServicePackageCollector
File: harvest_collectors/services.py
This collector produces:
ServiceSnapshotobjects for enabled systemd services,PackageSnapshotobjects for manual packages not already covered by services,- alias maps used by later
/etcattribution, seen_by_rolestate reused by later collectors.
For each enabled service it:
- derives a safe role name from the unit,
- queries systemd metadata,
- infers packages from the unit fragment owner,
ExecStart, and related/etctopdirs, - collects unit drop-ins, environment files, distro-specific likely config files, and modified package-owned config,
- collects related unowned
/etc/<hint>and/etc/<hint>.dfiles, - captures candidates with
capture_file(), - builds a
ServiceSnapshot.
It also collects timer override files. If a timer triggers a known service, timer files are attached to that service snapshot. Otherwise, the timer is associated with inferred packages.
Manual packages are processed after services. Packages already covered by service snapshots are not duplicated as standalone package roles. Packages with no detected config are still represented with has_config=False so renderers can install them.
Known enablement symlinks for nginx/apache are captured as ManagedLink entries at the end of the collector.
9.4 UsersCollector
File: harvest_collectors/users.py
This collector returns a UsersCollection containing:
UsersSnapshot,FlatpakSnapshot,SnapSnapshot.
User discovery is in accounts.collect_non_system_users(). It reads /etc/login.defs, /etc/passwd, /etc/group, home directories, and user Flatpak installs. It filters out users below UID_MIN, root, nobody, and non-login shells such as nologin and /bin/false.
Default user file capture is intentionally narrow:
authorized_keys,- safe public SSH material where supported by helpers.
Automatic shell dotfile capture only runs in dangerous mode.
The same collector discovers:
- system Flatpaks,
- system Flatpak remotes,
- per-user Flatpaks,
- per-user Flatpak remotes,
- system Snaps.
9.5 ContainerImagesCollector
File: harvest_collectors/container_images.py
This collector inspects Docker and Podman image caches when the relevant engine exists.
For each engine it:
- runs
<engine> image ls -q --no-trunc, - inspects images in chunks with
<engine> image inspect ..., - normalises image IDs, tags, digests, OS/architecture/platform fields, and tag aliases,
- prefers digest-pinned pull refs from
RepoDigests.
Renderers only enforce exact pull state for images with a usable digest. Images with only local tags and no digest are represented with notes rather than fake reproducibility.
9.6 PackageManagerConfigCollector
File: harvest_collectors/package_manager.py
This collector emits a dedicated package-manager config snapshot:
apt_configon dpkg systems,dnf_configon rpm systems.
APT capture includes /etc/apt, sources, .sources files, trusted keyrings, and keyrings referenced through signed-by / Signed-By.
DNF/YUM capture includes /etc/dnf, /etc/yum, /etc/yum.conf, /etc/yum.repos.d/*.repo, and /etc/pki/rpm-gpg/*.
9.7 etc_custom scan
etc_custom is still assembled inside harvest.harvest() rather than in its own collector.
It captures:
- essential system config from
system_paths.iter_system_capture_paths(), - remaining unowned config-like files found by walking
/etc.
Before adding shared snippets such as /etc/logrotate.d/* or /etc/cron.d/* to etc_custom, _target_role_for_shared_snippet() tries to attach them to a more meaningful service/package role.
9.8 UsrLocalCustomCollector
File: harvest_collectors/paths.py
This collector creates usr_local_custom from:
- files under
/usr/local/etc, - executable files under
/usr/local/bin.
It respects IgnorePolicy, PathFilter, and global de-duplication.
9.9 ExtraPathsCollector
File: harvest_collectors/paths.py
This collector handles --include-path and --exclude-path and creates extra_paths.
For included directories, it records directory metadata as ManagedDir entries while walking. For included files, it relies on expand_includes() and then capture_file().
10. Path scanners and package hints
system_paths.py contains known path lists and filesystem scanners.
Important functions and constants:
ALLOWED_UNOWNED_EXTSdecides which unowned/etcfiles look config-like.MAX_FILES_CAPandMAX_UNOWNED_FILES_PER_ROLEcap broad scans.is_confish()checks whether a path looks like configuration.scan_unowned_under_roots()finds unowned files under candidate roots.iter_matching_files()expands glob specs and walks directory hits.iter_apt_capture_paths()anditer_dnf_capture_paths()collect package-manager config.iter_system_capture_paths()returns fixed essential system config candidates.persistent_ipset_globs(),persistent_iptables_v4_globs(), andpersistent_iptables_v6_globs()support runtime firewall fallback decisions.
package_hints.py turns package/unit names into stable role names and attempts to infer relationships.
Important helpers:
safe_name(),role_id(),role_name_from_unit(),role_name_from_pkg(),package_section_from_installations(),hint_names(),add_pkgs_from_etc_topdirs(),maybe_add_specific_paths().
SHARED_ETC_TOPDIRS in package_hints.py prevents shared directories such as /etc/default, /etc/pam.d, /etc/systemd, /etc/ssh, /etc/apt, and /etc/dnf from being attributed too broadly to one package.
role_names.py protects singleton role names such as users, flatpak, snap, container_images, apt_config, dnf_config, firewall_runtime, sysctl, etc_custom, usr_local_custom, and extra_paths from collisions with package/service-derived roles.
11. Manifest orchestration
manifest.py is a target router and SOPS wrapper. It does not render target resources itself.
Entry point:
manifest(
bundle_dir,
out,
fqdn=None,
jinjaturtle="auto",
sops_fingerprints=None,
no_common_roles=False,
target="ansible",
)
Plain mode dispatches to:
target=ansible -> ansible.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...)
target=puppet -> puppet.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...)
target=salt -> salt.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...)
SOPS mode:
- accepts an already-decrypted bundle directory or a SOPS-encrypted harvest tarball,
- decrypts/extracts with safe tar extraction when needed,
- renders target output into a secure temp directory,
- tars the manifest directory under a
manifest/prefix, - encrypts the tarball with SOPS,
- returns the encrypted output path.
The renderers do not know about SOPS.
12. The renderer-neutral CMModule model
File: cm.py
CMModule is the shared resource model used heavily by Puppet and Salt and partially by Ansible.
@dataclass
class CMModule:
role_name: str
module_name: str
packages: Set[str]
groups: Set[str]
users: Dict[str, Dict[str, Any]]
dirs: Dict[str, Dict[str, Any]]
files: Dict[str, Dict[str, Any]]
links: Dict[str, Dict[str, Any]]
services: Dict[str, Dict[str, Any]]
firewall_runtime: Dict[str, Any]
notes: List[str]
Important methods and helpers include:
add_managed_dir(),add_managed_file(),add_managed_link(),add_package_snapshot(),add_service_snapshot_state(),user_records_from_snapshot(),add_flatpak_snapshot(),add_snap_snapshot(),add_firewall_runtime_snapshot(),package_service_entries(),active_service_units_by_package(),active_service_units_for_package_snapshot(),remove_directory_resource_conflicts().
12.1 Common role grouping
CMModule.package_service_entries() is the shared grouping mechanism for package and service snapshots.
use_common_roles=True groups package/service snapshots into section/group roles such as Debian Section or RPM Group labels. use_common_roles=False preserves one generated role/module/state per package or service snapshot.
Default behaviour:
normal manifest, no --no-common-roles: group package/service roles
--fqdn mode: no common grouping
--no-common-roles: no common grouping
--fqdn implies no common roles because host-specific output should preserve per-host state rather than merging unrelated resources into shared roles.
12.2 Catalog conflict resolution
resolve_catalog_conflicts() runs for Puppet and Salt.
It removes duplicates across generated modules/states for:
- packages,
- groups,
- users,
- directories,
- files,
- symlinks,
- services.
It also removes directory resources that conflict with a file or link at the same path. This matters because Puppet and Salt compile a single catalog; duplicates that Ansible might tolerate can fail hard there.
13. Ansible renderer
File: ansible.py
Entry point:
ansible.manifest_from_bundle_dir(
bundle_dir,
out_dir,
fqdn=None,
jinjaturtle="auto",
no_common_roles=False,
)
It instantiates AnsibleManifestRenderer(...).render().
13.1 Ansible render flow
flowchart TD
A[AnsibleManifestRenderer.render] --> B[AnsibleRole.load_state]
B --> C[roles_from_state + inventory_packages_from_state]
C --> D[_prepare_ansible_context]
D --> E[_write_site_scaffold]
E --> F[_collect_ansible_roles]
F --> G[_render_managed_file_roles]
F --> H[_render_users_role]
F --> I[_render_flatpak_role]
F --> J[_render_snap_role]
F --> K[_render_container_images_role]
F --> L[_render_sysctl_role]
F --> M[_render_firewall_runtime_role]
M --> N[_render_enroll_runtime_role if firewall runtime exists]
F --> O[_render_service_roles]
F --> P[_render_common_ansible_roles]
F --> Q[_render_package_roles]
Q --> R[_write_manifest_playbook]
R --> S[README.md]
13.2 Output layout
Default single-site output:
<out>/
ansible.cfg
playbook.yml
README.md
requirements.yml
roles/
<role>/
tasks/main.yml
handlers/main.yml
defaults/main.yml
meta/main.yml
files/...
templates/...
--fqdn site-mode output adds inventory and host vars:
<out>/
inventory/
hosts.yml
host_vars/<fqdn>/<role>/
main.yml
.files/...
roles/<role>/...
In default mode, variables normally live in roles/<role>/defaults/main.yml and raw files live under roles/<role>/files/.
In --fqdn mode, host-specific values and artifacts live under inventory/host_vars/<fqdn>/<role>/, while reusable role scaffolding remains under roles/.
13.3 Role ordering
Ansible playbook roles are ordered intentionally:
- package-manager config roles (
apt_config,dnf_config), - common grouped roles,
- standalone package roles,
- service roles,
- custom file roles (
etc_custom,usr_local_custom,extra_paths), - Flatpak, Snap, container images, users,
- cron/logrotate moved toward the end when present,
- runtime roles (
enroll_runtime,sysctl,firewall_runtime).
enroll_runtime is rendered only when firewall runtime is rendered.
13.4 Role tags
Generated playbooks tag roles with role_<safe_role_name>. diff --enforce --target ansible uses these tags to narrow enforcement to roles relevant to the drift report when it can.
Puppet and Salt enforcement do not currently narrow to per-role tags; they run the full generated local manifest/state tree.
13.5 Ansible and JinjaTurtle
Ansible uses jinjaturtle.jinjify_managed_files().
When JinjaTurtle is enabled and supports a harvested config file, the renderer can write:
- a Jinja2 template under
templates/, - variables in
defaults/main.ymlorinventory/host_vars/<fqdn>/<role>/main.yml.
If JinjaTurtle is unavailable in auto mode, fails, emits missing variables, or does not support the path, Ansible falls back to copying the raw harvested file.
14. Puppet renderer
File: puppet.py
Entry point:
puppet.manifest_from_bundle_dir(
bundle_dir,
out_dir,
fqdn=None,
no_common_roles=False,
jinjaturtle="auto",
)
It instantiates PuppetManifestRenderer(...).render().
14.1 Puppet render flow
flowchart TD
A[PuppetManifestRenderer.render] --> B[PuppetRole.load_state]
B --> C[resolve_jinjaturtle_mode]
C --> D[_collect_puppet_roles]
D --> E[resolve_catalog_conflicts]
E --> F[_sync_service_notifications]
F --> G[write modules/<module>/manifests/init.pp]
G --> H[write metadata.json]
H --> I{fqdn?}
I -->|no| J[write manifests/site.pp with node default]
I -->|yes| K[write hiera.yaml]
K --> L[write data/nodes/<fqdn>.yaml]
L --> M[write Hiera-driven site.pp]
J --> N[README.md]
M --> N
14.2 PuppetRole
PuppetRole extends CMModule and converts snapshots into Puppet-friendly resources. It handles:
- packages,
- users and groups,
- managed dirs/files/symlinks,
- services,
- sysctl apply execs,
- Flatpak remotes/apps via guarded
exec, - Snap installs via guarded
exec, - Docker/Podman images by digest via guarded
exec, - firewall runtime files and refresh-only restore execs,
- JinjaTurtle ERB templates and class/Hiera parameter values.
_puppet_name() sanitises module names and avoids Puppet reserved words such as default, class, node, site, and init.
14.3 Output layout
Default mode:
<out>/
manifests/site.pp
README.md
modules/
<module>/
metadata.json
manifests/init.pp
files/...
templates/...
Default site.pp includes generated classes in manifest order under a node default or named node block.
14.4 Puppet --fqdn / Hiera mode
When --fqdn is supplied, Puppet output switches to Hiera-style node data:
<out>/
hiera.yaml
manifests/site.pp
data/
common.yaml
nodes/<fqdn>.yaml
modules/
<module>/
metadata.json
manifests/init.pp
files/nodes/<fqdn>/...
templates/...
In this mode:
site.ppincludes classes from Hiera keyenroll::classes,data/nodes/<fqdn>.yamlcontains class list and parameter data,- module classes are data-driven via Automatic Parameter Lookup,
- node-specific raw file artifacts live under
modules/<module>/files/nodes/<fqdn>/..., - JinjaTurtle ERB template values are written into node Hiera data.
Re-running Enroll with another --fqdn into the same output directory is intended to add or replace that node's YAML without deleting existing node data.
14.5 Puppet and JinjaTurtle
Puppet now participates in the shared JinjaTurtle integration.
When enabled, Puppet calls jinjaturtle with ERB-specific options:
--template-engine erb
--puppet-class <module_name>
The resulting template is written under:
modules/<module>/templates/<src_rel>.erb
Static single-node mode renders class parameters with defaults and uses:
content => template('<module>/<src_rel>.erb')
Hiera mode writes template parameter values into data/nodes/<fqdn>.yaml and renders data-driven file resources.
jinjaturtle.missing_erb_template_vars() checks that ERB instance variables such as @main_key have matching context/Hiera data. If variables are missing, Enroll falls back to raw file copying rather than emitting a broken Puppet template.
15. Salt renderer
File: salt.py
Entry point:
salt.manifest_from_bundle_dir(
bundle_dir,
out_dir,
fqdn=None,
no_common_roles=False,
jinjaturtle="auto",
)
It instantiates SaltManifestRenderer(...).render().
15.1 Salt render flow
flowchart TD
A[SaltManifestRenderer.render] --> B[SaltRole.load_state]
B --> C[resolve_jinjaturtle_mode]
C --> D[_collect_salt_roles]
D --> E[resolve_catalog_conflicts]
E --> F[write states/roles/<role>/init.sls]
F --> G{fqdn?}
G -->|no| H[write states/top.sls target '*']
G -->|yes| I[write pillar node data]
I --> J[write states/top.sls and pillar/top.sls]
H --> K[write config/master.d/enroll.conf]
J --> K
K --> L[README.md]
15.2 SaltRole
SaltRole extends CMModule and changes managed_owner_attr to user, because Salt file.managed uses user rather than owner.
It prepares:
- packages as
pkg.installed, - groups as
group.present, - users as
user.present, - dirs/files/symlinks as Salt
file.*states, - services as
service.runningorservice.dead, - Flatpaks/Snaps via guarded
cmd.run, - Docker/Podman images via guarded
cmd.run, - firewall runtime restore commands,
- optional Jinja templates for managed files.
15.3 Output layout
Default mode:
<out>/
README.md
config/master.d/enroll.conf
states/
top.sls
roles/<role>/
init.sls
files/...
templates/...
--fqdn mode:
<out>/
states/
top.sls
roles/<role>/init.sls
pillar/
top.sls
nodes/<sanitised-fqdn>_<digest>.sls
The Salt renderer can accumulate node data in --fqdn mode and preserves existing top data where appropriate.
15.4 Salt and JinjaTurtle
Salt uses jinjaturtle.jinjify_artifact() directly. When successful, a managed file becomes a Salt file.managed with:
source: salt://roles/<role>/templates/<src_rel>.j2
template: jinja
context: {...}
Salt has one additional compatibility step: _saltify_jinjaturtle_template() rewrites Ansible-oriented to_json(...) filters emitted by JinjaTurtle into Salt-safe context variables or tojson filters.
If templating fails or is unsupported, the renderer falls back to a literal file copy under files/.
16. Shared JinjaTurtle integration
File: jinjaturtle.py
JinjaTurtle mode is resolved by:
resolve_jinjaturtle_mode("auto" | "on" | "off")
Semantics:
auto: usejinjaturtlewhen it exists onPATH; otherwise copy raw files.on: requirejinjaturtle; error if missing.off: never use it.
Supported path types include structured config suffixes:
.ini .cfg .json .toml .yaml .yml .xml .repo
and systemd unit-like suffixes:
.service .socket .target .timer .path .mount .automount .slice .swap .scope .link .netdev .network
Special format forcing is used for:
main.cf->postfix,- systemd unit files ->
systemd, sshd_config,ssh_config, and matching*.confsnippets undersshd_config.d/ssh_config.d->ssh.
The central helper is:
jinjify_artifact(
bundle_dir,
artifact_role,
src_rel,
dest_path,
template_root,
jt_exe=...,
jt_enabled=...,
template_engine="jinja2" | "erb",
puppet_class=..., # Puppet only
)
Ansible uses jinjify_managed_files() because it merges variables into role defaults or host vars. Salt uses jinjify_artifact() directly because context lives with each file.managed. Puppet uses jinjify_artifact(..., template_engine="erb", puppet_class=<module>) so variables line up with Puppet class/Hiera names.
Safety checks:
missing_jinja_template_vars()rejects Jinja2 templates that reference absent variables.missing_erb_template_vars()rejects ERB templates that reference absent Puppet/Hiera variables.
When checks fail, Enroll deletes obsolete generated templates when appropriate and falls back to raw file copying.
17. Diff, notifications, and enforcement
File: diff.py
17.1 Inputs
compare_harvests() accepts:
- bundle directories,
- direct
state.jsonpaths, - plain
.tar.gz/.tgzbundles, - SOPS-encrypted bundles when
sops_mode=Trueor the name ends with.sops.
Bundle resolution is handled by _bundle_from_input(), which reuses remote._safe_extract_tar() for tarball extraction.
17.2 What diff compares
compare_harvests() compares:
- package add/remove/version changes,
- enabled systemd unit add/remove/state/package changes,
- user add/remove/field changes,
- managed file add/remove/content/metadata changes.
File content changes are detected by hashing artifacts.
--exclude-path filtering applies only to file drift reporting, not package/service/user diffs.
--ignore-package-versions suppresses package version-only drift from both the report and has_changes, but package additions/removals are still reported.
Reports are formatted by:
format_report(report, fmt="text" | "markdown" | "json")
17.3 Enforcement decision
has_enforceable_drift() is intentionally conservative.
Enforceable drift includes:
- packages that were removed from the current host but existed in the baseline,
- baseline services that were removed or changed in meaningful non-package fields,
- baseline users that were removed or changed,
- baseline files that were removed or changed.
Not enforceable:
- newly installed packages,
- package version changes alone,
- newly enabled services,
- newly added users,
- newly added managed files.
This keeps --enforce focused on restoring baseline state rather than deleting unknown current state or downgrading packages.
17.4 Target-selected enforcement
enforce_old_harvest() now accepts target="ansible" | "puppet" | "salt".
It performs:
- resolve the old/baseline harvest,
- build a best-effort enforcement plan from the diff report,
- generate a temporary manifest from the old harvest using the selected target,
- run the matching local apply tool,
- attach enforcement metadata to the diff report.
Target commands:
ansible -> ansible-playbook -i localhost, -c local playbook.yml
puppet -> puppet apply --modulepath ./modules [--hiera_config ./hiera.yaml] manifests/site.pp
salt -> salt-call --local --file-root ./states [--pillar-root ./pillar] state.apply
Only Ansible uses generated per-role tags to narrow the apply scope. Puppet and Salt enforcement deliberately run the full generated local manifest/state tree for now. The JSON report keeps target-specific compatibility fields such as ansible_playbook, puppet, or salt_call.
17.5 Notifications
diff.py also supports webhooks and email notifications:
post_webhook()sends JSON/text/markdown payloads with optional extra headers.send_email()uses SMTP when configured or local sendmail when SMTP is omitted.
CLI notification options are only sent when differences exist unless --notify-always is set.
18. Explanation and validation
18.1 explain.py
explain_state() reads a harvest and produces text or JSON explaining:
- host metadata,
- role summaries,
- users,
- services,
- package snapshots,
- runtime firewall,
- sysctl,
- custom files,
- inventory packages,
- notes and exclusion reasons.
This is intended to answer “what did Enroll collect and why?”
18.2 validate.py
validate_harvest() checks:
state.jsonexists,- it parses as JSON,
- it validates against the vendored schema unless
--no-schemais set, - every
managed_file.src_relpoints to an artifact file, - firewall runtime generated artifacts exist,
- there are no unreferenced artifact files, reported as warnings.
It returns a ValidationResult with errors, warnings, ok(), to_dict(), and to_text().
The CLI supports local schema override with --schema, warning failure with --fail-on-warnings, JSON/text output, and --out.
19. Remote harvesting
File: remote.py
Remote mode is called from cli.py when --remote-host is supplied.
Public entry point:
remote_harvest(...)
It wraps _remote_harvest() and handles:
- optional sudo password prompting,
- optional SSH key passphrase prompting or environment variable lookup,
- retrying when remote sudo requires a password,
- retrying when an encrypted SSH private key needs a passphrase.
19.1 Remote harvest flow
flowchart TD
A[remote_harvest] --> B[resolve sudo password]
B --> C[resolve SSH key passphrase]
C --> D[_remote_harvest]
D --> E[build local enroll.pyz zipapp]
E --> F[connect with Paramiko]
F --> G[upload zipapp]
G --> H[run remote enroll harvest]
H --> I[tar/gzip remote bundle]
I --> J[download tarball]
J --> K[_safe_extract_tar locally]
K --> L[return local state.json path]
_build_enroll_pyz() packages the local enroll Python package into a zipapp and uses enroll.cli:main as its entry point.
19.2 SSH config support
--remote-ssh-config enables Paramiko SSHConfig support for settings such as:
HostName,Port,User,IdentityFile,ConnectTimeout,ProxyCommand,AddressFamily,HostKeyAliaswhere supported by the connection logic.
Unknown host keys are rejected by default through Paramiko's reject policy. Users should have valid host keys in known hosts.
19.3 Safe tar extraction
_safe_extract_tar() validates tar members before extraction and rejects:
- absolute paths,
..traversal,- symlinks,
- hardlinks,
- device nodes,
- anything resolving outside the destination.
This helper is reused by remote harvest, manifest SOPS extraction, and diff bundle resolution.
20. SOPS support
File: sopsutil.py
SOPS support is binary tarball encryption, not field-level YAML encryption.
20.1 Harvest SOPS mode
enroll harvest --sops <fingerprint...>:
- harvests into a secure temp directory,
- tars the bundle,
- encrypts it with SOPS binary mode,
- writes
harvest.tar.gz.sopsor the requested output file.
20.2 Manifest SOPS mode
enroll manifest --sops <fingerprint...>:
- decrypts/extracts the harvest if needed,
- generates the chosen target manifest in a temp directory,
- tars the generated output,
- encrypts it as a single SOPS file.
20.3 Helpers
sopsutil.py provides:
find_sops_cmd(),require_sops_cmd(),encrypt_file_binary(),decrypt_file_binary_to().
Encryption/decryption helpers write via temp files and default to mode 0600.
21. Configuration file support
cli.py supports optional INI config files.
Discovery order:
--no-configdisables config loading,--config PATHor-c PATH,$ENROLL_CONFIG,./enroll.ini,./.enroll.ini,$XDG_CONFIG_HOME/enroll/enroll.ini,~/.config/enroll/enroll.ini.
Config sections are translated into argv tokens by _inject_config_argv():
[enroll]for global options,[harvest],[manifest],[single-shot],[diff],[explain],[validate]for subcommand options,[single_shot]is accepted as an alias for[single-shot].
CLI flags win because config-derived tokens are inserted before user-supplied argv tokens.
The translation is argparse-driven, so new flags often gain config-file support automatically as long as they are represented by normal argparse actions.
22. CLI flags that affect multiple layers
22.1 --target
--target ansible|puppet|salt exists for:
enroll manifest,enroll single-shot,enroll diff --enforce.
For manifest and single-shot, it chooses the output renderer. For diff --enforce, it chooses both the temporary manifest target and the local apply tool.
22.2 --fqdn
--fqdn changes output semantics, not just filenames:
- Ansible: uses inventory/host_vars and host-specific artifacts.
- Puppet: uses Hiera node data and Hiera-driven classes.
- Salt: uses pillar node data and minion-targeted top files.
--fqdn implies no common role grouping.
22.3 --no-common-roles
Disables the default grouping of package/service snapshots by Debian Section or RPM Group. This preserves one generated role/module/state per package or unit snapshot.
22.4 --jinjaturtle / --no-jinjaturtle
The CLI maps these to renderer mode strings:
no flag -> auto
--jinjaturtle -> on
--no-jinjaturtle -> off
All three manifest targets receive this mode. Puppet uses ERB when JinjaTurtle is enabled; Ansible and Salt use Jinja2.
23. Tests and how to navigate them
Run tests with:
poetry install
poetry run pytest
or the repository helper when appropriate:
./tests.sh
Important test files:
| Test file | What it covers |
|---|---|
test_cli.py |
argparse dispatch, remote flags, manifest target forwarding, single-shot flow. |
test_cli_config_and_sops.py, test_cli_helpers.py |
config-file injection and SOPS output helpers. |
test_harvest.py, test_harvest_helpers.py |
harvest orchestration, sysctl/firewall helpers, role naming. |
test_harvest_collectors.py |
runtime and container image collectors. |
test_harvest_cron_logrotate.py |
cron/logrotate unification. |
test_harvest_symlinks.py |
nginx/apache enabled symlink capture. |
test_accounts.py |
users, Flatpak, Snap parsing/discovery. |
test_ignore.py, test_ignore_dir.py |
secret/noise policy. |
test_pathfilter.py |
include/exclude matching and expansion. |
test_platform.py, test_platform_backends.py |
platform detection and backend behaviour. |
test_debian.py, test_rpm.py, test_rpm_run.py |
package manager helpers. |
test_manifest.py, test_manifest_ansible.py |
Ansible rendering and role behaviour. |
test_manifest_puppet.py |
Puppet rendering, Hiera mode, reserved names, firewall/container/Flatpak/Snap/JinjaTurtle support. |
test_manifest_salt.py |
Salt rendering, pillar mode, JinjaTurtle, firewall/container/Flatpak/Snap support. |
test_manifest_symlinks.py |
symlink manifest output. |
test_jinjaturtle.py |
shared template generation and fallback safety. |
test_diff_bundle.py, test_diff_ignore_versions_exclude_enforce.py, test_diff_notifications.py |
diff, bundle resolution, target-selected enforcement, notifications. |
test_remote.py |
remote harvest, SSH/sudo prompts, safe tar extraction. |
test_explain.py |
harvest explanation output. |
test_validate.py |
schema/artifact validation. |
test_cm.py |
CMModule conflict resolution and service-package helpers. |
test_fsutil.py, test_fsutil_extra.py |
file hashing and stat metadata helpers. |
When changing behaviour, extend the closest specific tests rather than relying only on broad integration tests.
24. Common maintenance tasks
24.1 Add a new thing to harvest
- Add or extend a dataclass in
harvest_types.pyif existing snapshots cannot represent it. - Add a collector under
harvest_collectors/if it is a distinct feature. - Add the collector to the sequence in
harvest.harvest(). - Add the snapshot to the
state = {...}object inharvest.harvest(). - Update
schema/state.schema.json. - Update renderers that should emit the new resource.
- Update
explain.pyandvalidate.pyif users need visibility or artifact checks. - Add tests for harvest and each renderer.
24.2 Add a new renderer target
- Create
<target>.pywithmanifest_from_bundle_dir(). - Load state via
CMModule.load_state()orstate.load_state(). - Consume
roles_from_state()andinventory_packages_from_state(). - Convert snapshots into renderer-specific role/module/state objects.
- Reuse
CMModule.package_service_entries()for package/service grouping. - Run conflict resolution if the target compiles a global catalog.
- Write target output and README.
- Add the target to
manifest.manifest()validation and dispatch. - Add CLI choices in
_add_common_manifest_args()and diff enforcement if applicable. - Add tests.
24.3 Add a new CLI flag
For harvest-affecting flags:
- add the flag to
cli.pyforharvestand possiblysingle-shot, - forward it to
harvest.harvest()orremote.remote_harvest(), - forward it through remote command construction if remote mode needs it,
- check whether config-file injection handles it,
- add tests in
test_cli.pyand feature-specific tests.
For manifest-affecting flags:
- add it to
_add_common_manifest_args()if all manifest-like commands need it, - forward it through
manifest.manifest(), - forward it to target renderers,
- add tests for forwarding and output.
For diff enforcement flags:
- add argparse support under the
diffsubparser, - pass values to
compare_harvests()orenforce_old_harvest(), - update report formatting if new fields appear,
- add tests in
test_diff_ignore_versions_exclude_enforce.pyortest_diff_notifications.py.
24.4 Change file safety rules
Modify ignore.py and add tests in test_ignore.py / test_ignore_dir.py.
Be careful:
- relaxing safety affects secret exposure risk,
- tightening safety can make expected config disappear,
- binary allowance matters for APT/RPM keyrings,
--dangerousmust remain explicit for risky harvesting.
24.5 Change service/package attribution
Most logic is in:
harvest_collectors/services.py,package_hints.py,system_paths.py,- package backend
modified_paths()implementations.
Preserve these invariants:
- cron/logrotate should stay unified when installed,
- shared directories should not be attributed too broadly,
- package-manager config belongs in
apt_config/dnf_config, captured_globalshould prevent duplicates,- stopped services should not receive broad restart notifications.
24.6 Change manifest role grouping
Common grouping uses:
CMModule.package_service_entries(),package_section_label(),section_label_for_packages().
Remember:
- default non-
--fqdnoutput groups package/service roles unless--no-common-rolesis set, --fqdnimplies per-role output,- Ansible, Puppet, and Salt grouping should stay conceptually aligned,
- Puppet/Salt need
resolve_catalog_conflicts()after grouping.
24.7 Change JinjaTurtle support
Shared path support and safety checks belong in jinjaturtle.py.
Renderer-specific behaviour belongs in the renderer:
- Ansible: variables in defaults or host vars, templates under role
templates/. - Puppet: ERB templates, class params or Hiera values.
- Salt:
file.managedcontext and Salt-safe Jinja rewrites.
Fallback-to-raw-copy is part of the product contract unless JinjaTurtle was explicitly required and missing.
24.8 Change diff enforcement
diff --enforce now has a target dimension.
When changing it, keep these distinctions clear:
has_enforceable_drift()decides whether enforcement should run._enforcement_plan()finds relevant baseline roles.- Ansible uses role tags from the plan.
- Puppet and Salt currently run a full manifest/state apply.
_enforcement_command()is the source of truth for local apply commands.cli.pyattaches enforcement metadata to the report and formats it.
Do not make enforcement delete newly added packages/users/files/services unless the safety model is explicitly redesigned.
25. Important maintenance hazards
25.1 Renderer output is downstream of harvest state
If a renderer needs information, first ask whether that information belongs in state.json. Avoid papering over missing harvest facts inside a renderer.
25.2 --fqdn mode is not cosmetic
--fqdn changes where variables and artifacts live and how target inclusion works.
A change that works in default mode can still break:
- Ansible host vars,
- Puppet Hiera node data,
- Salt pillar node data.
25.3 Puppet and Salt are stricter about duplicates
Ansible often tolerates repeated packages or tasks. Puppet and Salt compile catalogs where duplicate resources can fail. Keep resolve_catalog_conflicts() in mind whenever adding resources.
25.4 Secret avoidance is part of the product contract
Default harvest should avoid likely secrets. --dangerous exists because useful files may contain secrets. Do not silently make risky harvesting the default.
25.5 Runtime state should not override persistent config
Firewall runtime capture is skipped when persistent firewall config exists. Preserve this principle for future runtime snapshots.
25.6 JinjaTurtle is best-effort except when explicitly required
auto mode should not make manifest generation fail merely because templating failed. on should require the executable; unsupported or unsafe individual files should still fall back to raw copy unless code explicitly changes that contract.
25.7 Role names must be sanitised
Raw package/service names can be invalid or reserved in Ansible roles, Puppet classes, or Salt SLS names. Use role-name helpers and singleton collision protection.
25.8 Tests encode edge cases
Many behaviours exist because of previously found edge cases:
- non-root/no-sudo harvests,
- Puppet reserved words,
- Salt Docker module availability limitations,
- symlink capture,
- JinjaTurtle missing variables,
- Salt JSON filter compatibility,
- file caps,
- SOPS secure temp files,
- tar path traversal,
- target-selected diff enforcement.
Before simplifying logic, search the tests.
26. Troubleshooting guide
26.1 Generated manifest references a missing artifact
Likely causes:
managed_files[*].src_relwas added without copying intoartifacts/,- a renderer used the generated role/module name instead of the artifact role,
- a role was renamed after harvest but before artifact lookup,
--fqdnfile prefixes are wrong.
Start with:
enroll validate /path/to/harvest
Then inspect:
state.json roles.*.managed_files[*]
artifacts/<role>/<src_rel>
26.2 Puppet fails with duplicate resources
Check:
_collect_puppet_roles(),resolve_catalog_conflicts(),role_order_key(),- whether a new resource type needs conflict resolution,
- whether a directory resource conflicts with a file/link of the same path.
26.3 Salt fails with duplicate IDs or missing modules
Check:
_state_id()naming,_collect_salt_roles()grouping,resolve_catalog_conflicts(),- guarded
cmd.runfallbacks for Docker/Podman/Snap/Flatpak.
Salt uses guarded shell commands for some resources because native states/modules are not consistently available across Salt installations.
26.4 Ansible check mode reports unexpected changes
Check:
- role ordering,
- grouped mode versus
--fqdn/--no-common-roles, - handler notifications,
- whether runtime roles were emitted without runtime artifacts,
- harvested directory/file mode normalisation.
Grouped and per-role output can legitimately produce different numbers of reported changes.
26.5 A file was not harvested
Check, in order:
- Was it excluded by
--exclude-path? - Was it denied by
IgnorePolicy? - Was it too large?
- Did it look binary?
- Did it contain sensitive-looking content?
- Was it already captured by another role via
captured_global? - Is it outside known scanned locations?
- Would
--include-pathcollect it? - Does it require
--dangerous?
enroll explain can show notes and exclusion reasons.
26.6 diff --enforce fails
Check:
- whether the selected
--targettool is onPATH, ansible-playbookfor Ansible,puppetfor Puppet,salt-callfor Salt,- whether the generated temp manifest has the expected target entrypoint,
- whether the report contains enforceable drift,
- whether package drift is only version changes or additions, which enforcement skips.
26.7 Remote harvest fails with sudo or SSH key prompts
Relevant flags:
--ask-become-pass,--ask-key-passphrase,--ssh-key-passphrase-env,--no-sudo,--remote-ssh-config.
Interactive sessions can prompt and retry. Non-interactive sessions should pass explicit flags or environment variables.
27. Practical code-reading map
| Feature/question | Start with | Then read |
|---|---|---|
| CLI option behaviour | cli.py |
called module for args.cmd |
| Local harvest ordering | harvest.py:harvest() |
harvest_collectors/ |
| Why a file was skipped | capture.py, ignore.py, pathfilter.py |
explain.py |
| File metadata/hash helpers | fsutil.py |
debian.py, capture.py |
| Service/package attribution | harvest_collectors/services.py |
package_hints.py, platform.py |
| APT/DNF config capture | harvest_collectors/package_manager.py |
system_paths.py |
| Users and SSH keys | harvest_collectors/users.py |
accounts.py |
| Flatpak/Snap parsing | accounts.py |
renderer Flatpak/Snap helpers |
| Docker/Podman images | harvest_collectors/container_images.py |
renderer container image helpers |
| Runtime firewall | harvest_collectors/runtime.py, harvest.py |
renderer firewall helpers |
| Sysctl | harvest.py sysctl helpers |
renderer sysctl role functions |
| Ansible output | ansible.py:AnsibleManifestRenderer.render() |
_render_* helpers |
| Puppet output | puppet.py:PuppetManifestRenderer.render() |
_collect_puppet_roles() |
| Salt output | salt.py:SaltManifestRenderer.render() |
_collect_salt_roles() |
| Grouping/common roles | cm.py |
renderer collection functions |
| JinjaTurtle | jinjaturtle.py |
renderer managed-content code |
| Diff/enforce | diff.py |
manifest.py, target renderer |
| Validation | validate.py |
schema file and state.json |
| Remote mode | remote.py |
cli.py remote branches |
| SOPS | sopsutil.py |
cli.py, manifest.py, diff.py |
28. Glossary
Harvest bundle
A directory or encrypted tarball containing state.json and artifacts/.
Snapshot
A structured object under roles in state.json, such as a ServiceSnapshot or PackageSnapshot.
Managed file A file Enroll intends generated CM code to recreate. It has a destination path and a matching artifact file.
Managed link A symlink Enroll intends generated CM code to recreate.
Managed dir A directory Enroll intends generated CM code to ensure exists with recorded metadata.
Role The Enroll logical group for related resources. In Ansible it usually maps to an Ansible role. In Puppet it maps to a module/class. In Salt it maps to an SLS role.
Artifact role
The role directory under artifacts/ that contains a harvested file. This can differ from the generated renderer role when grouping is enabled.
Common/grouped role A generated role/module/state that merges multiple package/service snapshots by Debian Section or RPM Group.
Site mode / --fqdn mode
Host-specific output mode. Ansible uses host vars, Puppet uses Hiera node data, and Salt uses pillar node data.
Dangerous mode Explicit opt-in mode that relaxes safety checks and enables risky capture such as user shell dotfiles.
JinjaTurtle Optional external tool used to convert recognised config files into Jinja2 or ERB templates plus variable defaults/context.
Enforcement target
The config manager chosen for diff --enforce with --target ansible|puppet|salt.
29. Final maintenance model
Most changes should preserve this pipeline:
Collect facts and files safely
-> represent them in target-neutral state.json
-> keep artifact references consistent
-> let each renderer translate the same state into its own idioms
-> validate the bundle and test each target
Before changing code, ask:
- Is this a harvest concern or renderer concern?
- Does
state.jsonor the schema need to change? - Does this affect
--fqdnmode? - Does this introduce duplicate ownership of a path/resource?
- Does this weaken default secret avoidance?
- Do Puppet and Salt need conflict handling?
- Does JinjaTurtle fallback still behave safely?
- Does
diff --enforce --target ...still do the conservative thing? - Do existing tests explain why the current behaviour exists?
Keeping those boundaries clear is the main way to maintain Enroll without creating subtle cross-target regressions.