Page:
Technical_Decomp_Harvest
Pages
Contact Me
Home
Technical_Decomp_Accounts
Technical_Decomp_Cache
Technical_Decomp_Diff
Technical_Decomp_Harvest
Technical_Decomp_Ignore
Technical_Decomp_JinjaTurtle
Technical_Decomp_Manifest
Technical_Decomp_PathFilter
Technical_Decomp_SopsUtil
Technical_Decomp_Systemd
Troubleshooting
enroll single-shot
enroll diff
enroll harvest
enroll manifest
No results
Table of Contents
- enroll/harvest.py
- ManagedFile (dataclass)
- Purpose: describes one file that harvest successfully copied into the bundle.
- Fields:
- Where it’s used:
- ExcludedFile (dataclass)
- ServiceSnapshot (dataclass)
- Purpose: captures everything enroll learned about one enabled systemd service unit.
- Fields:
- How it’s “computed” in harvest:
- Why this class matters:
- PackageSnapshot (dataclass)
- Purpose: captures “manual packages” (from apt-mark showmanual) that weren’t already covered by any service snapshot.
- Fields:
- How it’s computed:
- UsersSnapshot (dataclass)
- AptConfigSnapshot (dataclass)
- EtcCustomSnapshot (dataclass)
- Purpose: “catch-all” role for remaining config-ish files under /etc that weren’t already attributed to a service/package/users/apt.
- Fields:
- How it’s populated:
- UsrLocalCustomSnapshot (dataclass)
- ExtraPathsSnapshot (dataclass)
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
enroll/harvest.py
harvest.harvest() is the producer:
- uses UnitInfo / TimerInfo (systemd introspection)
- uses IgnorePolicy + PathFilter/CompiledPathPattern to decide what files are safe to copy
- emits ServiceSnapshot, PackageSnapshot, UsersSnapshot, etc.
- emits ManagedFile and ExcludedFile entries everywhere
- writes everything into state.json, and file copies into
artifacts/<role>/...
ManagedFile (dataclass)
Purpose: describes one file that harvest successfully copied into the bundle.
Fields:
- path: absolute original path on host
- src_rel: relative path used inside
artifacts/<role>/...(almost alwayspath.lstrip("/")) - owner, group, mode: captured from stat_triplet()
- reason: classification string explaining why it was captured (examples):
- systemd_dropin, systemd_envfile
- modified_conffile, modified_packaged_file
- custom_unowned, custom_specific_path
- authorized_keys, ssh_public_key
- usr_local_bin_script, usr_local_etc_custom
- user_include (from --include-path)
Where it’s used:
Written into snapshots in state.json.
manifest.py reads these to generate Ansible tasks (copy/template actions).
diff.py reads these to detect changes and to locate the artifact content.
ExcludedFile (dataclass)
Purpose: records a file that was considered but not included, plus why.
Fields:
- path
- reason: a concise reason code, typically:
- user_excluded (PathFilter)
- ignore policy reasons like denied_path, binary_like, sensitive_content, too_large, unreadable, etc.
Where it’s used:
Stored in each snapshot’s excluded list in state.json.
Mostly informational (helps explain why something didn’t get harvested).
ServiceSnapshot (dataclass)
Purpose: captures everything enroll learned about one enabled systemd service unit.
Fields:
- unit: e.g. nginx.service
- role_name: derived role name (sanitized service-ish identifier)
- packages: Debian package names inferred as belonging to the service
- active_state, sub_state, unit_file_state, condition_result:
- copied from systemctl show fields via systemd.get_unit_info()
- managed_files: list of ManagedFile harvested for this role
- excluded: list of ExcludedFile not harvested
- notes: warnings or anomalies (e.g. failure to query unit info)
How it’s “computed” in harvest:
- Enumerate enabled services: systemd.list_enabled_services().
- For each unit:
- gather unit metadata (fragment file, dropins, env files, exec paths)
- infer owning packages via dpkg_owner() on:
- the unit fragment
- ExecStart paths
- consider candidate /etc files from:
- systemd dropins/envfiles (only under /etc)
- modified dpkg conffiles or packaged files under /etc (by md5 compare)
- service-specific “unowned” files under
/etc/<hint>trees
- filter each candidate through:
- user exclude patterns (PathFilter.is_excluded)
- IgnorePolicy.deny_reason
- readability + regular-file checks
- copy accepted files into
artifacts/<role>/<src_rel>
Why this class matters:
It is the core unit of “role inference” for running services.
PackageSnapshot (dataclass)
Purpose: captures “manual packages” (from apt-mark showmanual) that weren’t already covered by any service snapshot.
Fields:
- package: package name
- role_name: computed role name (e.g. pkg_postfix)
- managed_files, excluded, notes
How it’s computed:
- list_manual_packages() returns “manually installed”.
- Anything already mentioned in any ServiceSnapshot.packages is skipped (recorded in manual_packages_skipped in state.json).
- For remaining packages:
- detect modified conffiles / modified packaged files under /etc via hashes
- capture associated timer overrides if the timer is attributable to that package
- scan for custom/unowned files under
/etc/<topdir>trees for the package
UsersSnapshot (dataclass)
Purpose: captures non-system users and safe SSH public artifacts.
Fields:
- role_name: always "users" in current code
- users: list of dicts derived from UserRecord
- managed_files: copied ssh public material (as ManagedFile)
- excluded: skipped ssh files (as ExcludedFile)
- notes: errors (e.g. couldn’t enumerate users)
AptConfigSnapshot (dataclass)
Purpose: captures APT configuration and key material.
Fields:
- role_name: "apt_config"
- managed_files, excluded, notes
How it’s populated:
- Uses _iter_apt_capture_paths() (in harvest.py) to produce specific key APT paths (sources lists, keyrings, etc.).
- Each candidate is filtered via PathFilter + IgnorePolicy, then copied.
EtcCustomSnapshot (dataclass)
Purpose: “catch-all” role for remaining config-ish files under /etc that weren’t already attributed to a service/package/users/apt.
Fields:
- role_name: "etc_custom"
- managed_files, excluded, notes
How it’s populated:
- Build a set of “already captured” files from other roles.
- Add certain “system essentials” even if package-owned (_iter_system_capture_paths()).
- Walk /etc and include unowned files that look “config-ish” (_is_confish()), subject to caps.
- Extra logic: if a file is in a shared snippet dir like /etc/cron.d/ or /etc/logrotate.d/, it attempts to re-attach it to an existing role by filename matching (so it doesn’t pollute etc_custom).
UsrLocalCustomSnapshot (dataclass)
Purpose: captures custom local admin content from /usr/local.
Fields:
- role_name: "usr_local_custom"
- managed_files, excluded, notes
How it’s populated:
- Scans /usr/local/etc (collect regular files, subject to IgnorePolicy)
- Scans /usr/local/bin but only collects executable files (checks mode has any execute bit)
- Caps per scan to avoid explosion.
ExtraPathsSnapshot (dataclass)
Purpose: captures user-requested extra files from --include-path (and records include/exclude patterns used).
Fields:
- role_name: "extra_paths"
- include_patterns, exclude_patterns: as provided on CLI
- managed_files, excluded, notes
How it’s populated:
- Uses PathFilter.iter_include_patterns() + expand_includes() to turn patterns into concrete file paths.
- For each included file not already captured elsewhere:
- filter via exclude + IgnorePolicy
- copy into artifacts/extra_paths/...
- record ManagedFile(reason="user_include")