# Enroll Development Guide Interested in the internals of Enroll? This guide describes the current `enroll` codebase for maintainers. It focuses on how the project is organised, what calls what, how harvest state flows into generated configuration-management output, and which invariants matter when changing the code. --- ## 1. What Enroll does `enroll` is a Linux host inspection and configuration-management generation tool. Its core pipeline is: ```text Running Linux host | | enroll harvest v Harvest bundle state.json artifacts// | | enroll manifest --target ansible|puppet|salt v Generated configuration-management output Ansible roles/playbook Puppet modules/site.pp/Hiera data Salt states/pillar data ``` The harvest bundle is deliberately target-neutral. Ansible, Puppet, and Salt renderers all consume the same `state.json` shape and the same harvested artifacts. Renderer code should translate harvest state into the target's idioms; it should not invent source facts that belong in the harvest. `enroll diff` is also built around harvest bundles. It compares two harvests and, when `--enforce` is requested, can generate a temporary manifest from the old harvest and apply it locally with the selected target: ```bash enroll diff --old ./baseline --new ./current --enforce --target ansible enroll diff --old ./baseline --new ./current --enforce --target puppet enroll diff --old ./baseline --new ./current --enforce --target salt ``` For enforcement, the user is responsible for having the chosen local apply tool on `PATH`: `ansible-playbook`, `puppet`, or `salt-call`. --- ## 2. Repository layout The project is a single Python package under `enroll/` with tests under `tests/`. ```text enroll/ __main__.py python -m enroll entry point cli.py argparse CLI and subcommand dispatcher version.py package version lookup harvest.py top-level local harvest orchestration and runtime helpers harvest_types.py dataclasses persisted into state.json harvest_collectors/ feature-specific collectors used by harvest.py context.py HarvestContext and HarvestCollector base runtime.py root-only runtime state collector wrapper cron_logrotate.py cron/logrotate unification collector services.py systemd service + manual package collector users.py users, SSH public files, Flatpak, Snap collector package_manager.py apt/dnf/yum config collectors container_images.py Docker/Podman image collector paths.py /usr/local and --include-path collectors manifest.py target router and SOPS manifest wrapper ansible.py Ansible renderer puppet.py Puppet renderer salt.py Salt renderer cm.py renderer-neutral CMModule model and grouping helpers role_names.py reserved singleton role-name protection accounts.py users, SSH public files, Flatpak and Snap discovery platform.py OS/package-backend abstraction debian.py dpkg/apt helpers rpm.py rpm/dnf/yum helpers systemd.py systemctl wrappers and parsers system_paths.py known config paths and filesystem scanners package_hints.py service/package name and config attribution helpers capture.py safe file/symlink capture into artifacts/ fsutil.py file md5 + owner/group/mode helpers ignore.py secret/noise avoidance policy pathfilter.py --include-path / --exclude-path matching and expansion state.py state.json load/write helpers yamlutil.py YAML helpers used by renderers/JinjaTurtle jinjaturtle.py optional config-file templating integration diff.py harvest comparison, notifications, and target-selected enforcement explain.py human/JSON explanation of harvest contents validate.py schema and artifact consistency validation remote.py Paramiko remote harvest implementation cache.py secure local cache directories for harvests sopsutil.py SOPS binary encryption/decryption helpers schema/state.schema.json JSON Schema for harvest state tests/ test_*.py unit tests grouped mostly by module/feature ``` The installed command is configured in `pyproject.toml`: ```toml [tool.poetry.scripts] enroll = "enroll.cli:main" ``` `python -m enroll` calls the same CLI through `enroll/__main__.py`. --- ## 3. Main runtime flows ### 3.1 CLI entry flow All user-facing commands enter through `enroll.cli.main()`. ```text enroll command -> enroll.cli.main() -> builds argparse parser and subparsers -> discovers optional INI config file -> injects config-derived argv defaults before user argv -> parses final argv -> dispatches by args.cmd ``` The supported subcommands are: ```text harvest collect a harvest bundle from a local or remote host manifest generate Ansible/Puppet/Salt output from a harvest bundle single-shot run harvest and manifest in one command diff compare two harvest bundles and optionally enforce old state explain produce a human/JSON explanation of a harvest validate validate state.json and referenced artifacts ``` `cli.py` should stay orchestration-heavy, not domain-heavy. It should parse flags, handle config/SOPS/remote branching, and then call the relevant module. It should not contain the meaning of a service, package, user, file, renderer resource, or harvest snapshot. ### 3.2 Subcommand call graph ```mermaid flowchart TD A[enroll.cli.main] --> B{args.cmd} B -->|harvest local| C[harvest.harvest] B -->|harvest remote| D[remote.remote_harvest] B -->|manifest| E[manifest.manifest] B -->|single-shot local| C B -->|single-shot remote| D C --> E D --> E B -->|diff| F[diff.compare_harvests] F --> G[diff.format_report] F --> H{--enforce?} H -->|yes| I[diff.enforce_old_harvest] I --> J[manifest.manifest target=ansible|puppet|salt] J --> K[ansible-playbook or puppet apply or salt-call] B -->|explain| L[explain.explain_state] B -->|validate| M[validate.validate_harvest] ``` Important dependency direction: ```text cli.py depends on harvest.py, manifest.py, diff.py, explain.py, validate.py, remote.py harvest.py depends on harvest_collectors, platform backends, capture policy, system scanners manifest.py depends on ansible.py, puppet.py, salt.py ansible.py / puppet.py / salt.py depend on state.py, cm.py, harvested artifacts, and target-specific helpers ``` --- ## 4. Harvest bundles A plaintext harvest bundle is a directory: ```text / state.json artifacts/ / etc/... usr/local/... sysctl/... firewall/... ``` `state.json` is written by `enroll.state.write_state()` and loaded by `enroll.state.load_state()`. The renderer relies on this invariant: ```text state.json roles.*.managed_files[*].src_rel must correspond to artifacts// ``` For example, a captured `/etc/nginx/nginx.conf` in role `nginx` normally becomes: ```json { "path": "/etc/nginx/nginx.conf", "src_rel": "etc/nginx/nginx.conf", "owner": "root", "group": "root", "mode": "0644", "reason": "modified_conffile" } ``` and the artifact is copied to: ```text artifacts/nginx/etc/nginx/nginx.conf ``` Renderer role/module names can differ from artifact roles, especially when common grouping is enabled. Copy helpers must therefore pass the original artifact role, not blindly use the generated renderer module name. --- ## 5. `state.json` shape and snapshot dataclasses The top-level state assembled by `harvest.harvest()` is: ```json { "enroll": { "version": "...", "harvest_time": 123456789 }, "host": { "hostname": "...", "os": "debian|redhat|unknown", "pkg_backend": "dpkg|rpm|unknown", "os_release": {} }, "inventory": { "packages": {} }, "roles": { "users": {}, "flatpak": {}, "snap": {}, "container_images": {}, "services": [], "packages": [], "apt_config": {}, "dnf_config": {}, "firewall_runtime": {}, "sysctl": {}, "etc_custom": {}, "usr_local_custom": {}, "extra_paths": {} } } ``` The persisted in-memory shapes live in `enroll/harvest_types.py`. | Dataclass | Purpose | |---|---| | `ManagedFile` | A file to recreate, with destination path, artifact path, owner, group, mode, and reason. | | `ManagedLink` | A symlink to recreate, such as `sites-enabled` entries. | | `ManagedDir` | A directory to ensure exists, with owner/group/mode. | | `ExcludedFile` | A path that was considered but skipped, with a reason. | | `ServiceSnapshot` | One enabled systemd service and its packages/config/state. | | `PackageSnapshot` | One manual package and related config. `has_config=False` is used when the package should still be installed but no config was found. | | `UsersSnapshot` | Human users, groups, managed SSH/dotfiles, and per-user Flatpak data. | | `FlatpakSnapshot` | System Flatpaks and system Flatpak remotes. | | `SnapSnapshot` | System Snap installs. | | `ContainerImagesSnapshot` | Docker/Podman image metadata. | | `AptConfigSnapshot` / `DnfConfigSnapshot` | Package-manager configuration. | | `EtcCustomSnapshot` | Unowned/custom `/etc` config not attributed elsewhere. | | `UsrLocalCustomSnapshot` | Selected `/usr/local/etc` files and executable `/usr/local/bin` files. | | `ExtraPathsSnapshot` | User-requested `--include-path` files/directories. | | `FirewallRuntimeSnapshot` | Generated artifacts from live ipset/iptables state. | | `SysctlSnapshot` | Generated `/etc/sysctl.d/99-enroll.conf` from live writable sysctls. | The JSON Schema in `enroll/schema/state.schema.json` is the validation contract for persisted harvests. --- ## 6. Harvest orchestration The local harvest entry point is: ```python enroll.harvest.harvest( bundle_dir, policy=None, dangerous=False, include_paths=None, exclude_paths=None, ) ``` It returns the path to the written `state.json`. ### 6.1 High-level harvest order The order matters because harvest maintains a global set of captured destination paths. Once a path is captured into one role, later collectors normally skip it. ```mermaid flowchart TD A[harvest.harvest] --> B[Build IgnorePolicy and PathFilter] B --> C[detect_platform + get_backend] C --> D[backend.build_etc_index] D --> E[RuntimeStateCollector] E --> F[CronLogrotateCollector] F --> G[ServicePackageCollector] G --> H[UsersCollector] H --> I[ContainerImagesCollector] I --> J[PackageManagerConfigCollector] J --> K[etc_custom scan inside harvest.py] K --> L[UsrLocalCustomCollector] L --> M[ExtraPathsCollector] M --> N[Build inventory.packages] N --> O[Add parent ManagedDir entries] O --> P[state.write_state] ``` ### 6.2 `HarvestContext` `HarvestContext` lives in `harvest_collectors/context.py`. It is passed to collectors instead of passing many individual dependencies. ```python @dataclass class HarvestContext: bundle_dir: str policy: IgnorePolicy path_filter: PathFilter platform: Dict[str, Any] backend: Any installed_pkgs: Dict[str, Any] installed_names: Set[str] owned_etc: Set[str] etc_owner_map: Dict[str, str] topdir_to_pkgs: Dict[str, Set[str]] pkg_to_etc_paths: Dict[str, List[str]] captured_global: Set[str] ``` New collectors should generally accept a `HarvestContext` and return dataclass snapshots from `harvest_types.py`. ### 6.3 Global de-duplication The harvester tries to avoid two generated roles owning the same destination path. This avoids duplicate config-manager resources and confusing diffs. `captured_global` is passed into `capture.capture_file()` and `capture.capture_link()`. If a destination path has already been seen, later collection attempts return without capturing it again. This is one of the most important invariants in the project: > A destination path should normally appear in only one generated role. Puppet and Salt also run `cm.resolve_catalog_conflicts()` after renderer role collection because they compile a single global catalog and duplicate resources are hard failures. --- ## 7. File capture and safety policy ### 7.1 `capture_file()` `capture.capture_file()` decides whether to copy a file into `artifacts/` and record it in a snapshot. ```text capture_file(abs_path, role_name, reason, policy, path_filter, ...) -> skip if already seen globally or in this role -> skip if --exclude-path matches -> ask IgnorePolicy.deny_reason(abs_path) -> stat owner/group/mode with fsutil.stat_triplet() -> copy to artifacts// -> append ManagedFile -> mark seen in role/global ``` `fsutil.stat_triplet()` returns owner, group, and a zero-padded octal mode string. It falls back to numeric uid/gid strings if user/group names cannot be resolved. ### 7.2 `capture_link()` `capture.capture_link()` records symlinks as `ManagedLink` entries rather than copying their targets. It is used for meaningful enablement symlinks, especially in nginx/apache-style trees such as: ```text /etc/nginx/sites-enabled/* /etc/nginx/modules-enabled/* /etc/apache2/conf-enabled/* /etc/apache2/mods-enabled/* /etc/apache2/sites-enabled/* ``` ### 7.3 User shell dotfiles `capture.capture_user_shell_dotfiles()` is called by `UsersCollector`, but only enabled when the harvest policy is dangerous. In dangerous mode: - `.bashrc`, `.profile`, and `.bash_logout` are captured only if they differ from `/etc/skel` baselines. - `.bash_aliases` is captured if present because there may be no skel baseline. Outside dangerous mode, Enroll records a note explaining that shell dotfiles were not auto-harvested. Users can still include specific files via `--include-path`, but the normal `IgnorePolicy` still applies unless `--dangerous` is also used. ### 7.4 `IgnorePolicy` `ignore.IgnorePolicy` is the default secret/noise avoidance layer. By default it skips likely sensitive or low-value files such as: - `/etc/shadow`, `/etc/gshadow`, and backup variants, - SSH host private keys, - private SSL/Let's Encrypt material, - log files and editor backups, - files larger than `max_file_bytes` (`256_000` by default), - binary-like files except known keyring formats, - sampled non-comment content that looks sensitive, such as private keys, `password=`, `token`, `secret`, or `api_key`. `--dangerous` sets `policy.dangerous = True`, disabling deny-globs and content sniffing. This is intentional and should remain explicit. The policy has separate methods for different filesystem types: - `deny_reason(path)` for regular files, - `deny_reason_dir(path)` for directories, - `deny_reason_link(path)` for symlinks. ### 7.5 `PathFilter` `pathfilter.PathFilter` implements user-supplied path controls: - `--include-path` adds extra files/directories to the `extra_paths` role. - `--exclude-path` removes matching paths from all harvesting. - Excludes always win over includes. Pattern styles: ```text /plain/path exact path or directory-prefix match glob:/path/**/*.x forced glob /path/**/*.x inferred glob because it contains glob characters re:^/path/...$ regex regex:^/path/...$ regex ``` `expand_includes()` is conservative: it ignores symlinks, respects excludes, caps file counts, and returns notes for unmatched patterns or caps. --- ## 8. Platform and package backends `platform.py` abstracts distribution-specific package behaviour. ```text platform.detect_platform() -> reads /etc/os-release -> returns PlatformInfo(os_family, pkg_backend, os_release) platform.get_backend(info) -> DpkgBackend for Debian-like systems -> RpmBackend for RedHat/Fedora-like systems ``` The backend interface is `PackageBackend`: ```python owner_of_path(path) list_manual_packages() installed_packages() build_etc_index() specific_paths_for_hints() is_pkg_config_path(path) modified_paths(pkg, paths) ``` ### 8.1 Debian backend `DpkgBackend` delegates to `debian.py`. It uses dpkg/apt data to provide package ownership, manual package lists, installed package inventory, `/etc` indexes, conffile hashes, and packaged-file md5 baselines. `DpkgBackend.modified_paths()` identifies: - `modified_conffile` when a dpkg conffile hash differs, - `modified_packaged_file` when a packaged file md5 differs. It deliberately leaves `/etc/apt`-style package-manager configuration for the `apt_config` role. ### 8.2 RPM backend `RpmBackend` delegates to `rpm.py`. It provides package ownership, manual package lists, installed package inventory, `/etc` indexes, RPM config file lists, and `rpm -V` style modified-file detection. RPM-family package-manager config paths such as `/etc/dnf`, `/etc/yum`, `/etc/yum.conf`, `/etc/yum.repos.d`, and `/etc/pki/rpm-gpg` are collected into `dnf_config`, not arbitrary package roles. ### 8.3 Adding a new package backend To support another package system: 1. implement a `PackageBackend` subclass, 2. route it from `platform.get_backend()`, 3. provide ownership lookup, manual package listing, installed package inventory, `/etc` indexing, modified config detection, and package-manager config exclusion, 4. add backend tests comparable to `test_debian.py`, `test_rpm.py`, and `test_platform.py`. --- ## 9. Harvest collectors in detail Collectors live under `enroll/harvest_collectors/`. ### 9.1 `RuntimeStateCollector` File: `harvest_collectors/runtime.py` This wrapper collects root-only live runtime state: - writable sysctl state, - live ipset state, - live IPv4 iptables state, - live IPv6 iptables state. The actual helper implementations currently live in `harvest.py`: - `_collect_sysctl_snapshot()`, - `_collect_firewall_runtime_snapshot()`, - `_parse_sysctl_a_output()`, - `_iptables_save_has_state()`, - `_ipset_save_has_state()`. If the process is not root, runtime capture returns empty snapshots with explanatory notes. #### Sysctl capture Sysctl capture runs `sysctl -a`, filters to writable/persistable single-line keys, and writes a generated artifact: ```text artifacts/sysctl/sysctl/99-enroll.conf ``` The destination managed by renderers is: ```text /etc/sysctl.d/99-enroll.conf ``` The filter skips volatile/action/identity keys and inactive mutually-exclusive zero values. This avoids generating config that fails or is noisy on replay. #### Firewall runtime capture Runtime firewall capture is a fallback. Enroll first checks for persistent firewall config such as: ```text /etc/iptables/rules.v4 /etc/iptables/rules.v6 /etc/sysconfig/iptables /etc/sysconfig/ip6tables /etc/ipset.conf /etc/ipset/* ``` If persistent files exist for a family, live runtime capture for that family is skipped. If no persistent file exists and live state is meaningful, Enroll writes generated artifacts such as: ```text artifacts/firewall_runtime/firewall/ipset.save artifacts/firewall_runtime/firewall/iptables.v4 artifacts/firewall_runtime/firewall/iptables.v6 ``` Renderers should only create a firewall runtime role when at least one runtime artifact exists. When firewall runtime is rendered, Ansible/Puppet/Salt also create an `enroll_runtime` role/module/state to own `/etc/enroll` before `/etc/enroll/firewall`. ### 9.2 `CronLogrotateCollector` File: `harvest_collectors/cron_logrotate.py` This collector runs before service/package collection to prevent cron and logrotate snippets from being scattered across unrelated roles. It detects cron packages such as `cron`, `cronie`, `cronie-anacron`, `vixie-cron`, and `fcron`, and detects `logrotate` separately. It captures cron-related paths such as: ```text /etc/crontab /etc/cron.d/* /etc/cron.hourly/* /etc/cron.daily/* /var/spool/cron/* /var/spool/crontabs/* /var/spool/anacron/* ``` It captures logrotate paths such as: ```text /etc/logrotate.conf /etc/logrotate.d/* ``` It returns `PackageSnapshot` objects for `cron` and `logrotate` when those packages exist. ### 9.3 `ServicePackageCollector` File: `harvest_collectors/services.py` This collector produces: - `ServiceSnapshot` objects for enabled systemd services, - `PackageSnapshot` objects for manual packages not already covered by services, - alias maps used by later `/etc` attribution, - `seen_by_role` state reused by later collectors. For each enabled service it: 1. derives a safe role name from the unit, 2. queries systemd metadata, 3. infers packages from the unit fragment owner, `ExecStart`, and related `/etc` topdirs, 4. collects unit drop-ins, environment files, distro-specific likely config files, and modified package-owned config, 5. collects related unowned `/etc/` and `/etc/.d` files, 6. captures candidates with `capture_file()`, 7. builds a `ServiceSnapshot`. It also collects timer override files. If a timer triggers a known service, timer files are attached to that service snapshot. Otherwise, the timer is associated with inferred packages. Manual packages are processed after services. Packages already covered by service snapshots are not duplicated as standalone package roles. Packages with no detected config are still represented with `has_config=False` so renderers can install them. Known enablement symlinks for nginx/apache are captured as `ManagedLink` entries at the end of the collector. ### 9.4 `UsersCollector` File: `harvest_collectors/users.py` This collector returns a `UsersCollection` containing: - `UsersSnapshot`, - `FlatpakSnapshot`, - `SnapSnapshot`. User discovery is in `accounts.collect_non_system_users()`. It reads `/etc/login.defs`, `/etc/passwd`, `/etc/group`, home directories, and user Flatpak installs. It filters out users below `UID_MIN`, `root`, `nobody`, and non-login shells such as `nologin` and `/bin/false`. Default user file capture is intentionally narrow: - `authorized_keys`, - safe public SSH material where supported by helpers. Automatic shell dotfile capture only runs in dangerous mode. The same collector discovers: - system Flatpaks, - system Flatpak remotes, - per-user Flatpaks, - per-user Flatpak remotes, - system Snaps. ### 9.5 `ContainerImagesCollector` File: `harvest_collectors/container_images.py` This collector inspects Docker and Podman image caches when the relevant engine exists. For each engine it: 1. runs ` image ls -q --no-trunc`, 2. inspects images in chunks with ` image inspect ...`, 3. normalises image IDs, tags, digests, OS/architecture/platform fields, and tag aliases, 4. prefers digest-pinned pull refs from `RepoDigests`. Renderers only enforce exact pull state for images with a usable digest. Images with only local tags and no digest are represented with notes rather than fake reproducibility. ### 9.6 `PackageManagerConfigCollector` File: `harvest_collectors/package_manager.py` This collector emits a dedicated package-manager config snapshot: - `apt_config` on dpkg systems, - `dnf_config` on rpm systems. APT capture includes `/etc/apt`, sources, `.sources` files, trusted keyrings, and keyrings referenced through `signed-by` / `Signed-By`. DNF/YUM capture includes `/etc/dnf`, `/etc/yum`, `/etc/yum.conf`, `/etc/yum.repos.d/*.repo`, and `/etc/pki/rpm-gpg/*`. ### 9.7 `etc_custom` scan `etc_custom` is still assembled inside `harvest.harvest()` rather than in its own collector. It captures: 1. essential system config from `system_paths.iter_system_capture_paths()`, 2. remaining unowned config-like files found by walking `/etc`. Before adding shared snippets such as `/etc/logrotate.d/*` or `/etc/cron.d/*` to `etc_custom`, `_target_role_for_shared_snippet()` tries to attach them to a more meaningful service/package role. ### 9.8 `UsrLocalCustomCollector` File: `harvest_collectors/paths.py` This collector creates `usr_local_custom` from: - files under `/usr/local/etc`, - executable files under `/usr/local/bin`. It respects `IgnorePolicy`, `PathFilter`, and global de-duplication. ### 9.9 `ExtraPathsCollector` File: `harvest_collectors/paths.py` This collector handles `--include-path` and `--exclude-path` and creates `extra_paths`. For included directories, it records directory metadata as `ManagedDir` entries while walking. For included files, it relies on `expand_includes()` and then `capture_file()`. --- ## 10. Path scanners and package hints `system_paths.py` contains known path lists and filesystem scanners. Important functions and constants: - `ALLOWED_UNOWNED_EXTS` decides which unowned `/etc` files look config-like. - `MAX_FILES_CAP` and `MAX_UNOWNED_FILES_PER_ROLE` cap broad scans. - `is_confish()` checks whether a path looks like configuration. - `scan_unowned_under_roots()` finds unowned files under candidate roots. - `iter_matching_files()` expands glob specs and walks directory hits. - `iter_apt_capture_paths()` and `iter_dnf_capture_paths()` collect package-manager config. - `iter_system_capture_paths()` returns fixed essential system config candidates. - `persistent_ipset_globs()`, `persistent_iptables_v4_globs()`, and `persistent_iptables_v6_globs()` support runtime firewall fallback decisions. `package_hints.py` turns package/unit names into stable role names and attempts to infer relationships. Important helpers: - `safe_name()`, - `role_id()`, - `role_name_from_unit()`, - `role_name_from_pkg()`, - `package_section_from_installations()`, - `hint_names()`, - `add_pkgs_from_etc_topdirs()`, - `maybe_add_specific_paths()`. `SHARED_ETC_TOPDIRS` in `package_hints.py` prevents shared directories such as `/etc/default`, `/etc/pam.d`, `/etc/systemd`, `/etc/ssh`, `/etc/apt`, and `/etc/dnf` from being attributed too broadly to one package. `role_names.py` protects singleton role names such as `users`, `flatpak`, `snap`, `container_images`, `apt_config`, `dnf_config`, `firewall_runtime`, `sysctl`, `etc_custom`, `usr_local_custom`, and `extra_paths` from collisions with package/service-derived roles. --- ## 11. Manifest orchestration `manifest.py` is a target router and SOPS wrapper. It does not render target resources itself. Entry point: ```python manifest( bundle_dir, out, fqdn=None, jinjaturtle="auto", sops_fingerprints=None, no_common_roles=False, target="ansible", ) ``` Plain mode dispatches to: ```text target=ansible -> ansible.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...) target=puppet -> puppet.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...) target=salt -> salt.manifest_from_bundle_dir(..., jinjaturtle=..., no_common_roles=...) ``` SOPS mode: 1. accepts an already-decrypted bundle directory or a SOPS-encrypted harvest tarball, 2. decrypts/extracts with safe tar extraction when needed, 3. renders target output into a secure temp directory, 4. tars the manifest directory under a `manifest/` prefix, 5. encrypts the tarball with SOPS, 6. returns the encrypted output path. The renderers do not know about SOPS. --- ## 12. The renderer-neutral `CMModule` model File: `cm.py` `CMModule` is the shared resource model used heavily by Puppet and Salt and partially by Ansible. ```python @dataclass class CMModule: role_name: str module_name: str packages: Set[str] groups: Set[str] users: Dict[str, Dict[str, Any]] dirs: Dict[str, Dict[str, Any]] files: Dict[str, Dict[str, Any]] links: Dict[str, Dict[str, Any]] services: Dict[str, Dict[str, Any]] firewall_runtime: Dict[str, Any] notes: List[str] ``` Important methods and helpers include: - `add_managed_dir()`, `add_managed_file()`, `add_managed_link()`, - `add_package_snapshot()`, - `add_service_snapshot_state()`, - `user_records_from_snapshot()`, - `add_flatpak_snapshot()`, `add_snap_snapshot()`, - `add_firewall_runtime_snapshot()`, - `package_service_entries()`, - `active_service_units_by_package()`, - `active_service_units_for_package_snapshot()`, - `remove_directory_resource_conflicts()`. ### 12.1 Common role grouping `CMModule.package_service_entries()` is the shared grouping mechanism for package and service snapshots. `use_common_roles=True` groups package/service snapshots into section/group roles such as Debian Section or RPM Group labels. `use_common_roles=False` preserves one generated role/module/state per package or service snapshot. Default behaviour: ```text normal manifest, no --no-common-roles: group package/service roles --fqdn mode: no common grouping --no-common-roles: no common grouping ``` `--fqdn` implies no common roles because host-specific output should preserve per-host state rather than merging unrelated resources into shared roles. ### 12.2 Catalog conflict resolution `resolve_catalog_conflicts()` runs for Puppet and Salt. It removes duplicates across generated modules/states for: - packages, - groups, - users, - directories, - files, - symlinks, - services. It also removes directory resources that conflict with a file or link at the same path. This matters because Puppet and Salt compile a single catalog; duplicates that Ansible might tolerate can fail hard there. --- ## 13. Ansible renderer File: `ansible.py` Entry point: ```python ansible.manifest_from_bundle_dir( bundle_dir, out_dir, fqdn=None, jinjaturtle="auto", no_common_roles=False, ) ``` It instantiates `AnsibleManifestRenderer(...).render()`. ### 13.1 Ansible render flow ```mermaid flowchart TD A[AnsibleManifestRenderer.render] --> B[AnsibleRole.load_state] B --> C[roles_from_state + inventory_packages_from_state] C --> D[_prepare_ansible_context] D --> E[_write_site_scaffold] E --> F[_collect_ansible_roles] F --> G[_render_managed_file_roles] F --> H[_render_users_role] F --> I[_render_flatpak_role] F --> J[_render_snap_role] F --> K[_render_container_images_role] F --> L[_render_sysctl_role] F --> M[_render_firewall_runtime_role] M --> N[_render_enroll_runtime_role if firewall runtime exists] F --> O[_render_service_roles] F --> P[_render_common_ansible_roles] F --> Q[_render_package_roles] Q --> R[_write_manifest_playbook] R --> S[README.md] ``` ### 13.2 Output layout Default single-site output: ```text / ansible.cfg playbook.yml README.md requirements.yml roles/ / tasks/main.yml handlers/main.yml defaults/main.yml meta/main.yml files/... templates/... ``` `--fqdn` site-mode output adds inventory and host vars: ```text / inventory/ hosts.yml host_vars/// main.yml .files/... roles//... ``` In default mode, variables normally live in `roles//defaults/main.yml` and raw files live under `roles//files/`. In `--fqdn` mode, host-specific values and artifacts live under `inventory/host_vars///`, while reusable role scaffolding remains under `roles/`. ### 13.3 Role ordering Ansible playbook roles are ordered intentionally: 1. package-manager config roles (`apt_config`, `dnf_config`), 2. common grouped roles, 3. standalone package roles, 4. service roles, 5. custom file roles (`etc_custom`, `usr_local_custom`, `extra_paths`), 6. Flatpak, Snap, container images, users, 7. cron/logrotate moved toward the end when present, 8. runtime roles (`enroll_runtime`, `sysctl`, `firewall_runtime`). `enroll_runtime` is rendered only when firewall runtime is rendered. ### 13.4 Role tags Generated playbooks tag roles with `role_`. `diff --enforce --target ansible` uses these tags to narrow enforcement to roles relevant to the drift report when it can. Puppet and Salt enforcement do not currently narrow to per-role tags; they run the full generated local manifest/state tree. ### 13.5 Ansible and JinjaTurtle Ansible uses `jinjaturtle.jinjify_managed_files()`. When JinjaTurtle is enabled and supports a harvested config file, the renderer can write: - a Jinja2 template under `templates/`, - variables in `defaults/main.yml` or `inventory/host_vars///main.yml`. If JinjaTurtle is unavailable in `auto` mode, fails, emits missing variables, or does not support the path, Ansible falls back to copying the raw harvested file. --- ## 14. Puppet renderer File: `puppet.py` Entry point: ```python puppet.manifest_from_bundle_dir( bundle_dir, out_dir, fqdn=None, no_common_roles=False, jinjaturtle="auto", ) ``` It instantiates `PuppetManifestRenderer(...).render()`. ### 14.1 Puppet render flow ```mermaid flowchart TD A[PuppetManifestRenderer.render] --> B[PuppetRole.load_state] B --> C[resolve_jinjaturtle_mode] C --> D[_collect_puppet_roles] D --> E[resolve_catalog_conflicts] E --> F[_sync_service_notifications] F --> G[write modules//manifests/init.pp] G --> H[write metadata.json] H --> I{fqdn?} I -->|no| J[write manifests/site.pp with node default] I -->|yes| K[write hiera.yaml] K --> L[write data/nodes/.yaml] L --> M[write Hiera-driven site.pp] J --> N[README.md] M --> N ``` ### 14.2 `PuppetRole` `PuppetRole` extends `CMModule` and converts snapshots into Puppet-friendly resources. It handles: - packages, - users and groups, - managed dirs/files/symlinks, - services, - sysctl apply execs, - Flatpak remotes/apps via guarded `exec`, - Snap installs via guarded `exec`, - Docker/Podman images by digest via guarded `exec`, - firewall runtime files and refresh-only restore execs, - JinjaTurtle ERB templates and class/Hiera parameter values. `_puppet_name()` sanitises module names and avoids Puppet reserved words such as `default`, `class`, `node`, `site`, and `init`. ### 14.3 Output layout Default mode: ```text / manifests/site.pp README.md modules/ / metadata.json manifests/init.pp files/... templates/... ``` Default `site.pp` includes generated classes in manifest order under a `node default` or named node block. ### 14.4 Puppet `--fqdn` / Hiera mode When `--fqdn` is supplied, Puppet output switches to Hiera-style node data: ```text / hiera.yaml manifests/site.pp data/ common.yaml nodes/.yaml modules/ / metadata.json manifests/init.pp files/nodes//... templates/... ``` In this mode: - `site.pp` includes classes from Hiera key `enroll::classes`, - `data/nodes/.yaml` contains class list and parameter data, - module classes are data-driven via Automatic Parameter Lookup, - node-specific raw file artifacts live under `modules//files/nodes//...`, - JinjaTurtle ERB template values are written into node Hiera data. Re-running Enroll with another `--fqdn` into the same output directory is intended to add or replace that node's YAML without deleting existing node data. ### 14.5 Puppet and JinjaTurtle Puppet now participates in the shared JinjaTurtle integration. When enabled, Puppet calls `jinjaturtle` with ERB-specific options: ```text --template-engine erb --puppet-class ``` The resulting template is written under: ```text modules//templates/.erb ``` Static single-node mode renders class parameters with defaults and uses: ```puppet content => template('/.erb') ``` Hiera mode writes template parameter values into `data/nodes/.yaml` and renders data-driven file resources. `jinjaturtle.missing_erb_template_vars()` checks that ERB instance variables such as `@main_key` have matching context/Hiera data. If variables are missing, Enroll falls back to raw file copying rather than emitting a broken Puppet template. --- ## 15. Salt renderer File: `salt.py` Entry point: ```python salt.manifest_from_bundle_dir( bundle_dir, out_dir, fqdn=None, no_common_roles=False, jinjaturtle="auto", ) ``` It instantiates `SaltManifestRenderer(...).render()`. ### 15.1 Salt render flow ```mermaid flowchart TD A[SaltManifestRenderer.render] --> B[SaltRole.load_state] B --> C[resolve_jinjaturtle_mode] C --> D[_collect_salt_roles] D --> E[resolve_catalog_conflicts] E --> F[write states/roles//init.sls] F --> G{fqdn?} G -->|no| H[write states/top.sls target '*'] G -->|yes| I[write pillar node data] I --> J[write states/top.sls and pillar/top.sls] H --> K[write config/master.d/enroll.conf] J --> K K --> L[README.md] ``` ### 15.2 `SaltRole` `SaltRole` extends `CMModule` and changes `managed_owner_attr` to `user`, because Salt `file.managed` uses `user` rather than `owner`. It prepares: - packages as `pkg.installed`, - groups as `group.present`, - users as `user.present`, - dirs/files/symlinks as Salt `file.*` states, - services as `service.running` or `service.dead`, - Flatpaks/Snaps via guarded `cmd.run`, - Docker/Podman images via guarded `cmd.run`, - firewall runtime restore commands, - optional Jinja templates for managed files. ### 15.3 Output layout Default mode: ```text / README.md config/master.d/enroll.conf states/ top.sls roles// init.sls files/... templates/... ``` `--fqdn` mode: ```text / states/ top.sls roles//init.sls pillar/ top.sls nodes/_.sls ``` The Salt renderer can accumulate node data in `--fqdn` mode and preserves existing top data where appropriate. ### 15.4 Salt and JinjaTurtle Salt uses `jinjaturtle.jinjify_artifact()` directly. When successful, a managed file becomes a Salt `file.managed` with: ```yaml source: salt://roles//templates/.j2 template: jinja context: {...} ``` Salt has one additional compatibility step: `_saltify_jinjaturtle_template()` rewrites Ansible-oriented `to_json(...)` filters emitted by JinjaTurtle into Salt-safe context variables or `tojson` filters. If templating fails or is unsupported, the renderer falls back to a literal file copy under `files/`. --- ## 16. Shared JinjaTurtle integration File: `jinjaturtle.py` JinjaTurtle mode is resolved by: ```python resolve_jinjaturtle_mode("auto" | "on" | "off") ``` Semantics: - `auto`: use `jinjaturtle` when it exists on `PATH`; otherwise copy raw files. - `on`: require `jinjaturtle`; error if missing. - `off`: never use it. Supported path types include structured config suffixes: ```text .ini .cfg .json .toml .yaml .yml .xml .repo ``` and systemd unit-like suffixes: ```text .service .socket .target .timer .path .mount .automount .slice .swap .scope .link .netdev .network ``` Special format forcing is used for: - `main.cf` -> `postfix`, - systemd unit files -> `systemd`, - `sshd_config`, `ssh_config`, and matching `*.conf` snippets under `sshd_config.d` / `ssh_config.d` -> `ssh`. The central helper is: ```python jinjify_artifact( bundle_dir, artifact_role, src_rel, dest_path, template_root, jt_exe=..., jt_enabled=..., template_engine="jinja2" | "erb", puppet_class=..., # Puppet only ) ``` Ansible uses `jinjify_managed_files()` because it merges variables into role defaults or host vars. Salt uses `jinjify_artifact()` directly because context lives with each `file.managed`. Puppet uses `jinjify_artifact(..., template_engine="erb", puppet_class=)` so variables line up with Puppet class/Hiera names. Safety checks: - `missing_jinja_template_vars()` rejects Jinja2 templates that reference absent variables. - `missing_erb_template_vars()` rejects ERB templates that reference absent Puppet/Hiera variables. When checks fail, Enroll deletes obsolete generated templates when appropriate and falls back to raw file copying. --- ## 17. Diff, notifications, and enforcement File: `diff.py` ### 17.1 Inputs `compare_harvests()` accepts: - bundle directories, - direct `state.json` paths, - plain `.tar.gz` / `.tgz` bundles, - SOPS-encrypted bundles when `sops_mode=True` or the name ends with `.sops`. Bundle resolution is handled by `_bundle_from_input()`, which reuses `remote._safe_extract_tar()` for tarball extraction. ### 17.2 What diff compares `compare_harvests()` compares: - package add/remove/version changes, - enabled systemd unit add/remove/state/package changes, - user add/remove/field changes, - managed file add/remove/content/metadata changes. File content changes are detected by hashing artifacts. `--exclude-path` filtering applies only to file drift reporting, not package/service/user diffs. `--ignore-package-versions` suppresses package version-only drift from both the report and `has_changes`, but package additions/removals are still reported. Reports are formatted by: ```python format_report(report, fmt="text" | "markdown" | "json") ``` ### 17.3 Enforcement decision `has_enforceable_drift()` is intentionally conservative. Enforceable drift includes: - packages that were removed from the current host but existed in the baseline, - baseline services that were removed or changed in meaningful non-package fields, - baseline users that were removed or changed, - baseline files that were removed or changed. Not enforceable: - newly installed packages, - package version changes alone, - newly enabled services, - newly added users, - newly added managed files. This keeps `--enforce` focused on restoring baseline state rather than deleting unknown current state or downgrading packages. ### 17.4 Target-selected enforcement `enforce_old_harvest()` now accepts `target="ansible" | "puppet" | "salt"`. It performs: 1. resolve the old/baseline harvest, 2. build a best-effort enforcement plan from the diff report, 3. generate a temporary manifest from the old harvest using the selected target, 4. run the matching local apply tool, 5. attach enforcement metadata to the diff report. Target commands: ```text ansible -> ansible-playbook -i localhost, -c local playbook.yml puppet -> puppet apply --modulepath ./modules [--hiera_config ./hiera.yaml] manifests/site.pp salt -> salt-call --local --file-root ./states [--pillar-root ./pillar] state.apply ``` Only Ansible uses generated per-role tags to narrow the apply scope. Puppet and Salt enforcement deliberately run the full generated local manifest/state tree for now. The JSON report keeps target-specific compatibility fields such as `ansible_playbook`, `puppet`, or `salt_call`. ### 17.5 Notifications `diff.py` also supports webhooks and email notifications: - `post_webhook()` sends JSON/text/markdown payloads with optional extra headers. - `send_email()` uses SMTP when configured or local sendmail when SMTP is omitted. CLI notification options are only sent when differences exist unless `--notify-always` is set. --- ## 18. Explanation and validation ### 18.1 `explain.py` `explain_state()` reads a harvest and produces text or JSON explaining: - host metadata, - role summaries, - users, - services, - package snapshots, - runtime firewall, - sysctl, - custom files, - inventory packages, - notes and exclusion reasons. This is intended to answer “what did Enroll collect and why?” ### 18.2 `validate.py` `validate_harvest()` checks: 1. `state.json` exists, 2. it parses as JSON, 3. it validates against the vendored schema unless `--no-schema` is set, 4. every `managed_file.src_rel` points to an artifact file, 5. firewall runtime generated artifacts exist, 6. there are no unreferenced artifact files, reported as warnings. It returns a `ValidationResult` with `errors`, `warnings`, `ok()`, `to_dict()`, and `to_text()`. The CLI supports local schema override with `--schema`, warning failure with `--fail-on-warnings`, JSON/text output, and `--out`. --- ## 19. Remote harvesting File: `remote.py` Remote mode is called from `cli.py` when `--remote-host` is supplied. Public entry point: ```python remote_harvest(...) ``` It wraps `_remote_harvest()` and handles: - optional sudo password prompting, - optional SSH key passphrase prompting or environment variable lookup, - retrying when remote sudo requires a password, - retrying when an encrypted SSH private key needs a passphrase. ### 19.1 Remote harvest flow ```mermaid flowchart TD A[remote_harvest] --> B[resolve sudo password] B --> C[resolve SSH key passphrase] C --> D[_remote_harvest] D --> E[build local enroll.pyz zipapp] E --> F[connect with Paramiko] F --> G[upload zipapp] G --> H[run remote enroll harvest] H --> I[tar/gzip remote bundle] I --> J[download tarball] J --> K[_safe_extract_tar locally] K --> L[return local state.json path] ``` `_build_enroll_pyz()` packages the local `enroll` Python package into a zipapp and uses `enroll.cli:main` as its entry point. ### 19.2 SSH config support `--remote-ssh-config` enables Paramiko `SSHConfig` support for settings such as: - `HostName`, - `Port`, - `User`, - `IdentityFile`, - `ConnectTimeout`, - `ProxyCommand`, - `AddressFamily`, - `HostKeyAlias` where supported by the connection logic. Unknown host keys are rejected by default through Paramiko's reject policy. Users should have valid host keys in known hosts. ### 19.3 Safe tar extraction `_safe_extract_tar()` validates tar members before extraction and rejects: - absolute paths, - `..` traversal, - symlinks, - hardlinks, - device nodes, - anything resolving outside the destination. This helper is reused by remote harvest, manifest SOPS extraction, and diff bundle resolution. --- ## 20. SOPS support File: `sopsutil.py` SOPS support is binary tarball encryption, not field-level YAML encryption. ### 20.1 Harvest SOPS mode `enroll harvest --sops `: 1. harvests into a secure temp directory, 2. tars the bundle, 3. encrypts it with SOPS binary mode, 4. writes `harvest.tar.gz.sops` or the requested output file. ### 20.2 Manifest SOPS mode `enroll manifest --sops `: 1. decrypts/extracts the harvest if needed, 2. generates the chosen target manifest in a temp directory, 3. tars the generated output, 4. encrypts it as a single SOPS file. ### 20.3 Helpers `sopsutil.py` provides: - `find_sops_cmd()`, - `require_sops_cmd()`, - `encrypt_file_binary()`, - `decrypt_file_binary_to()`. Encryption/decryption helpers write via temp files and default to mode `0600`. --- ## 21. Configuration file support `cli.py` supports optional INI config files. Discovery order: 1. `--no-config` disables config loading, 2. `--config PATH` or `-c PATH`, 3. `$ENROLL_CONFIG`, 4. `./enroll.ini`, 5. `./.enroll.ini`, 6. `$XDG_CONFIG_HOME/enroll/enroll.ini`, 7. `~/.config/enroll/enroll.ini`. Config sections are translated into argv tokens by `_inject_config_argv()`: - `[enroll]` for global options, - `[harvest]`, `[manifest]`, `[single-shot]`, `[diff]`, `[explain]`, `[validate]` for subcommand options, - `[single_shot]` is accepted as an alias for `[single-shot]`. CLI flags win because config-derived tokens are inserted before user-supplied argv tokens. The translation is argparse-driven, so new flags often gain config-file support automatically as long as they are represented by normal argparse actions. --- ## 22. CLI flags that affect multiple layers ### 22.1 `--target` `--target ansible|puppet|salt` exists for: - `enroll manifest`, - `enroll single-shot`, - `enroll diff --enforce`. For `manifest` and `single-shot`, it chooses the output renderer. For `diff --enforce`, it chooses both the temporary manifest target and the local apply tool. ### 22.2 `--fqdn` `--fqdn` changes output semantics, not just filenames: - Ansible: uses inventory/host_vars and host-specific artifacts. - Puppet: uses Hiera node data and Hiera-driven classes. - Salt: uses pillar node data and minion-targeted top files. `--fqdn` implies no common role grouping. ### 22.3 `--no-common-roles` Disables the default grouping of package/service snapshots by Debian Section or RPM Group. This preserves one generated role/module/state per package or unit snapshot. ### 22.4 `--jinjaturtle` / `--no-jinjaturtle` The CLI maps these to renderer mode strings: ```text no flag -> auto --jinjaturtle -> on --no-jinjaturtle -> off ``` All three manifest targets receive this mode. Puppet uses ERB when JinjaTurtle is enabled; Ansible and Salt use Jinja2. --- ## 23. Tests and how to navigate them Run tests with: ```bash poetry install poetry run pytest ``` or the repository helper when appropriate: ```bash ./tests.sh ``` Important test files: | Test file | What it covers | |---|---| | `test_cli.py` | argparse dispatch, remote flags, manifest target forwarding, single-shot flow. | | `test_cli_config_and_sops.py`, `test_cli_helpers.py` | config-file injection and SOPS output helpers. | | `test_harvest.py`, `test_harvest_helpers.py` | harvest orchestration, sysctl/firewall helpers, role naming. | | `test_harvest_collectors.py` | runtime and container image collectors. | | `test_harvest_cron_logrotate.py` | cron/logrotate unification. | | `test_harvest_symlinks.py` | nginx/apache enabled symlink capture. | | `test_accounts.py` | users, Flatpak, Snap parsing/discovery. | | `test_ignore.py`, `test_ignore_dir.py` | secret/noise policy. | | `test_pathfilter.py` | include/exclude matching and expansion. | | `test_platform.py`, `test_platform_backends.py` | platform detection and backend behaviour. | | `test_debian.py`, `test_rpm.py`, `test_rpm_run.py` | package manager helpers. | | `test_manifest.py`, `test_manifest_ansible.py` | Ansible rendering and role behaviour. | | `test_manifest_puppet.py` | Puppet rendering, Hiera mode, reserved names, firewall/container/Flatpak/Snap/JinjaTurtle support. | | `test_manifest_salt.py` | Salt rendering, pillar mode, JinjaTurtle, firewall/container/Flatpak/Snap support. | | `test_manifest_symlinks.py` | symlink manifest output. | | `test_jinjaturtle.py` | shared template generation and fallback safety. | | `test_diff_bundle.py`, `test_diff_ignore_versions_exclude_enforce.py`, `test_diff_notifications.py` | diff, bundle resolution, target-selected enforcement, notifications. | | `test_remote.py` | remote harvest, SSH/sudo prompts, safe tar extraction. | | `test_explain.py` | harvest explanation output. | | `test_validate.py` | schema/artifact validation. | | `test_cm.py` | `CMModule` conflict resolution and service-package helpers. | | `test_fsutil.py`, `test_fsutil_extra.py` | file hashing and stat metadata helpers. | When changing behaviour, extend the closest specific tests rather than relying only on broad integration tests. --- ## 24. Common maintenance tasks ### 24.1 Add a new thing to harvest 1. Add or extend a dataclass in `harvest_types.py` if existing snapshots cannot represent it. 2. Add a collector under `harvest_collectors/` if it is a distinct feature. 3. Add the collector to the sequence in `harvest.harvest()`. 4. Add the snapshot to the `state = {...}` object in `harvest.harvest()`. 5. Update `schema/state.schema.json`. 6. Update renderers that should emit the new resource. 7. Update `explain.py` and `validate.py` if users need visibility or artifact checks. 8. Add tests for harvest and each renderer. ### 24.2 Add a new renderer target 1. Create `.py` with `manifest_from_bundle_dir()`. 2. Load state via `CMModule.load_state()` or `state.load_state()`. 3. Consume `roles_from_state()` and `inventory_packages_from_state()`. 4. Convert snapshots into renderer-specific role/module/state objects. 5. Reuse `CMModule.package_service_entries()` for package/service grouping. 6. Run conflict resolution if the target compiles a global catalog. 7. Write target output and README. 8. Add the target to `manifest.manifest()` validation and dispatch. 9. Add CLI choices in `_add_common_manifest_args()` and diff enforcement if applicable. 10. Add tests. ### 24.3 Add a new CLI flag For harvest-affecting flags: 1. add the flag to `cli.py` for `harvest` and possibly `single-shot`, 2. forward it to `harvest.harvest()` or `remote.remote_harvest()`, 3. forward it through remote command construction if remote mode needs it, 4. check whether config-file injection handles it, 5. add tests in `test_cli.py` and feature-specific tests. For manifest-affecting flags: 1. add it to `_add_common_manifest_args()` if all manifest-like commands need it, 2. forward it through `manifest.manifest()`, 3. forward it to target renderers, 4. add tests for forwarding and output. For diff enforcement flags: 1. add argparse support under the `diff` subparser, 2. pass values to `compare_harvests()` or `enforce_old_harvest()`, 3. update report formatting if new fields appear, 4. add tests in `test_diff_ignore_versions_exclude_enforce.py` or `test_diff_notifications.py`. ### 24.4 Change file safety rules Modify `ignore.py` and add tests in `test_ignore.py` / `test_ignore_dir.py`. Be careful: - relaxing safety affects secret exposure risk, - tightening safety can make expected config disappear, - binary allowance matters for APT/RPM keyrings, - `--dangerous` must remain explicit for risky harvesting. ### 24.5 Change service/package attribution Most logic is in: - `harvest_collectors/services.py`, - `package_hints.py`, - `system_paths.py`, - package backend `modified_paths()` implementations. Preserve these invariants: - cron/logrotate should stay unified when installed, - shared directories should not be attributed too broadly, - package-manager config belongs in `apt_config`/`dnf_config`, - `captured_global` should prevent duplicates, - stopped services should not receive broad restart notifications. ### 24.6 Change manifest role grouping Common grouping uses: - `CMModule.package_service_entries()`, - `package_section_label()`, - `section_label_for_packages()`. Remember: - default non-`--fqdn` output groups package/service roles unless `--no-common-roles` is set, - `--fqdn` implies per-role output, - Ansible, Puppet, and Salt grouping should stay conceptually aligned, - Puppet/Salt need `resolve_catalog_conflicts()` after grouping. ### 24.7 Change JinjaTurtle support Shared path support and safety checks belong in `jinjaturtle.py`. Renderer-specific behaviour belongs in the renderer: - Ansible: variables in defaults or host vars, templates under role `templates/`. - Puppet: ERB templates, class params or Hiera values. - Salt: `file.managed` context and Salt-safe Jinja rewrites. Fallback-to-raw-copy is part of the product contract unless JinjaTurtle was explicitly required and missing. ### 24.8 Change diff enforcement `diff --enforce` now has a target dimension. When changing it, keep these distinctions clear: - `has_enforceable_drift()` decides whether enforcement should run. - `_enforcement_plan()` finds relevant baseline roles. - Ansible uses role tags from the plan. - Puppet and Salt currently run a full manifest/state apply. - `_enforcement_command()` is the source of truth for local apply commands. - `cli.py` attaches enforcement metadata to the report and formats it. Do not make enforcement delete newly added packages/users/files/services unless the safety model is explicitly redesigned. --- ## 25. Important maintenance hazards ### 25.1 Renderer output is downstream of harvest state If a renderer needs information, first ask whether that information belongs in `state.json`. Avoid papering over missing harvest facts inside a renderer. ### 25.2 `--fqdn` mode is not cosmetic `--fqdn` changes where variables and artifacts live and how target inclusion works. A change that works in default mode can still break: - Ansible host vars, - Puppet Hiera node data, - Salt pillar node data. ### 25.3 Puppet and Salt are stricter about duplicates Ansible often tolerates repeated packages or tasks. Puppet and Salt compile catalogs where duplicate resources can fail. Keep `resolve_catalog_conflicts()` in mind whenever adding resources. ### 25.4 Secret avoidance is part of the product contract Default harvest should avoid likely secrets. `--dangerous` exists because useful files may contain secrets. Do not silently make risky harvesting the default. ### 25.5 Runtime state should not override persistent config Firewall runtime capture is skipped when persistent firewall config exists. Preserve this principle for future runtime snapshots. ### 25.6 JinjaTurtle is best-effort except when explicitly required `auto` mode should not make manifest generation fail merely because templating failed. `on` should require the executable; unsupported or unsafe individual files should still fall back to raw copy unless code explicitly changes that contract. ### 25.7 Role names must be sanitised Raw package/service names can be invalid or reserved in Ansible roles, Puppet classes, or Salt SLS names. Use role-name helpers and singleton collision protection. ### 25.8 Tests encode edge cases Many behaviours exist because of previously found edge cases: - non-root/no-sudo harvests, - Puppet reserved words, - Salt Docker module availability limitations, - symlink capture, - JinjaTurtle missing variables, - Salt JSON filter compatibility, - file caps, - SOPS secure temp files, - tar path traversal, - target-selected diff enforcement. Before simplifying logic, search the tests. --- ## 26. Troubleshooting guide ### 26.1 Generated manifest references a missing artifact Likely causes: - `managed_files[*].src_rel` was added without copying into `artifacts/`, - a renderer used the generated role/module name instead of the artifact role, - a role was renamed after harvest but before artifact lookup, - `--fqdn` file prefixes are wrong. Start with: ```bash enroll validate /path/to/harvest ``` Then inspect: ```text state.json roles.*.managed_files[*] artifacts// ``` ### 26.2 Puppet fails with duplicate resources Check: - `_collect_puppet_roles()`, - `resolve_catalog_conflicts()`, - `role_order_key()`, - whether a new resource type needs conflict resolution, - whether a directory resource conflicts with a file/link of the same path. ### 26.3 Salt fails with duplicate IDs or missing modules Check: - `_state_id()` naming, - `_collect_salt_roles()` grouping, - `resolve_catalog_conflicts()`, - guarded `cmd.run` fallbacks for Docker/Podman/Snap/Flatpak. Salt uses guarded shell commands for some resources because native states/modules are not consistently available across Salt installations. ### 26.4 Ansible check mode reports unexpected changes Check: - role ordering, - grouped mode versus `--fqdn` / `--no-common-roles`, - handler notifications, - whether runtime roles were emitted without runtime artifacts, - harvested directory/file mode normalisation. Grouped and per-role output can legitimately produce different numbers of reported changes. ### 26.5 A file was not harvested Check, in order: 1. Was it excluded by `--exclude-path`? 2. Was it denied by `IgnorePolicy`? 3. Was it too large? 4. Did it look binary? 5. Did it contain sensitive-looking content? 6. Was it already captured by another role via `captured_global`? 7. Is it outside known scanned locations? 8. Would `--include-path` collect it? 9. Does it require `--dangerous`? `enroll explain` can show notes and exclusion reasons. ### 26.6 `diff --enforce` fails Check: - whether the selected `--target` tool is on `PATH`, - `ansible-playbook` for Ansible, - `puppet` for Puppet, - `salt-call` for Salt, - whether the generated temp manifest has the expected target entrypoint, - whether the report contains enforceable drift, - whether package drift is only version changes or additions, which enforcement skips. ### 26.7 Remote harvest fails with sudo or SSH key prompts Relevant flags: - `--ask-become-pass`, - `--ask-key-passphrase`, - `--ssh-key-passphrase-env`, - `--no-sudo`, - `--remote-ssh-config`. Interactive sessions can prompt and retry. Non-interactive sessions should pass explicit flags or environment variables. --- ## 27. Practical code-reading map | Feature/question | Start with | Then read | |---|---|---| | CLI option behaviour | `cli.py` | called module for `args.cmd` | | Local harvest ordering | `harvest.py:harvest()` | `harvest_collectors/` | | Why a file was skipped | `capture.py`, `ignore.py`, `pathfilter.py` | `explain.py` | | File metadata/hash helpers | `fsutil.py` | `debian.py`, `capture.py` | | Service/package attribution | `harvest_collectors/services.py` | `package_hints.py`, `platform.py` | | APT/DNF config capture | `harvest_collectors/package_manager.py` | `system_paths.py` | | Users and SSH keys | `harvest_collectors/users.py` | `accounts.py` | | Flatpak/Snap parsing | `accounts.py` | renderer Flatpak/Snap helpers | | Docker/Podman images | `harvest_collectors/container_images.py` | renderer container image helpers | | Runtime firewall | `harvest_collectors/runtime.py`, `harvest.py` | renderer firewall helpers | | Sysctl | `harvest.py` sysctl helpers | renderer sysctl role functions | | Ansible output | `ansible.py:AnsibleManifestRenderer.render()` | `_render_*` helpers | | Puppet output | `puppet.py:PuppetManifestRenderer.render()` | `_collect_puppet_roles()` | | Salt output | `salt.py:SaltManifestRenderer.render()` | `_collect_salt_roles()` | | Grouping/common roles | `cm.py` | renderer collection functions | | JinjaTurtle | `jinjaturtle.py` | renderer managed-content code | | Diff/enforce | `diff.py` | `manifest.py`, target renderer | | Validation | `validate.py` | schema file and `state.json` | | Remote mode | `remote.py` | `cli.py` remote branches | | SOPS | `sopsutil.py` | `cli.py`, `manifest.py`, `diff.py` | --- ## 28. Glossary **Harvest bundle** A directory or encrypted tarball containing `state.json` and `artifacts/`. **Snapshot** A structured object under `roles` in `state.json`, such as a `ServiceSnapshot` or `PackageSnapshot`. **Managed file** A file Enroll intends generated CM code to recreate. It has a destination path and a matching artifact file. **Managed link** A symlink Enroll intends generated CM code to recreate. **Managed dir** A directory Enroll intends generated CM code to ensure exists with recorded metadata. **Role** The Enroll logical group for related resources. In Ansible it usually maps to an Ansible role. In Puppet it maps to a module/class. In Salt it maps to an SLS role. **Artifact role** The role directory under `artifacts/` that contains a harvested file. This can differ from the generated renderer role when grouping is enabled. **Common/grouped role** A generated role/module/state that merges multiple package/service snapshots by Debian Section or RPM Group. **Site mode / `--fqdn` mode** Host-specific output mode. Ansible uses host vars, Puppet uses Hiera node data, and Salt uses pillar node data. **Dangerous mode** Explicit opt-in mode that relaxes safety checks and enables risky capture such as user shell dotfiles. **JinjaTurtle** Optional external tool used to convert recognised config files into Jinja2 or ERB templates plus variable defaults/context. **Enforcement target** The config manager chosen for `diff --enforce` with `--target ansible|puppet|salt`. --- ## 29. Final maintenance model Most changes should preserve this pipeline: ```text Collect facts and files safely -> represent them in target-neutral state.json -> keep artifact references consistent -> let each renderer translate the same state into its own idioms -> validate the bundle and test each target ``` Before changing code, ask: 1. Is this a harvest concern or renderer concern? 2. Does `state.json` or the schema need to change? 3. Does this affect `--fqdn` mode? 4. Does this introduce duplicate ownership of a path/resource? 5. Does this weaken default secret avoidance? 6. Do Puppet and Salt need conflict handling? 7. Does JinjaTurtle fallback still behave safely? 8. Does `diff --enforce --target ...` still do the conservative thing? 9. Do existing tests explain why the current behaviour exists? Keeping those boundaries clear is the main way to maintain Enroll without creating subtle cross-target regressions.