Add Technical_Decomp_Harvest
parent
81e29bf75a
commit
3a21e25d27
1 changed files with 191 additions and 0 deletions
191
Technical_Decomp_Harvest.md
Normal file
191
Technical_Decomp_Harvest.md
Normal file
|
|
@ -0,0 +1,191 @@
|
||||||
|
## enroll/harvest.py
|
||||||
|
|
||||||
|
All of these are dataclasses that act as the schema for state.json. harvest.harvest() creates them, then serializes them with asdict().
|
||||||
|
|
||||||
|
### ManagedFile (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: describes one file that harvest successfully copied into the bundle.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- path: absolute original path on host
|
||||||
|
- src_rel: relative path used inside artifacts/<role>/... (almost always path.lstrip("/"))
|
||||||
|
- owner, group, mode: captured from stat_triplet()
|
||||||
|
- reason: classification string explaining why it was captured (examples):
|
||||||
|
- systemd_dropin, systemd_envfile
|
||||||
|
- modified_conffile, modified_packaged_file
|
||||||
|
- custom_unowned, custom_specific_path
|
||||||
|
- authorized_keys, ssh_public_key
|
||||||
|
- usr_local_bin_script, usr_local_etc_custom
|
||||||
|
- user_include (from --include-path)
|
||||||
|
|
||||||
|
#### Where it’s used:
|
||||||
|
|
||||||
|
Written into snapshots in state.json.
|
||||||
|
|
||||||
|
manifest.py reads these to generate Ansible tasks (copy/template actions).
|
||||||
|
|
||||||
|
diff.py reads these to detect changes and to locate the artifact content.
|
||||||
|
___________________
|
||||||
|
|
||||||
|
### ExcludedFile (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: records a file that was considered but not included, plus why.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- path
|
||||||
|
- reason: a concise reason code, typically:
|
||||||
|
- user_excluded (PathFilter)
|
||||||
|
- ignore policy reasons like denied_path, binary_like, sensitive_content, too_large, unreadable, etc.
|
||||||
|
|
||||||
|
#### Where it’s used:
|
||||||
|
|
||||||
|
Stored in each snapshot’s excluded list in state.json.
|
||||||
|
|
||||||
|
Mostly informational (helps explain why something didn’t get harvested).
|
||||||
|
|
||||||
|
_____________________
|
||||||
|
|
||||||
|
### ServiceSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: captures everything enroll learned about one enabled systemd service unit.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- unit: e.g. nginx.service
|
||||||
|
- role_name: derived role name (sanitized service-ish identifier)
|
||||||
|
- packages: Debian package names inferred as belonging to the service
|
||||||
|
- active_state, sub_state, unit_file_state, condition_result:
|
||||||
|
- copied from systemctl show fields via systemd.get_unit_info()
|
||||||
|
- managed_files: list of ManagedFile harvested for this role
|
||||||
|
- excluded: list of ExcludedFile not harvested
|
||||||
|
- notes: warnings or anomalies (e.g. failure to query unit info)
|
||||||
|
|
||||||
|
#### How it’s “computed” in harvest:
|
||||||
|
|
||||||
|
- Enumerate enabled services: systemd.list_enabled_services().
|
||||||
|
- For each unit:
|
||||||
|
- gather unit metadata (fragment file, dropins, env files, exec paths)
|
||||||
|
- infer owning packages via dpkg_owner() on:
|
||||||
|
- the unit fragment
|
||||||
|
- ExecStart paths
|
||||||
|
- consider candidate /etc files from:
|
||||||
|
- systemd dropins/envfiles (only under /etc)
|
||||||
|
- modified dpkg conffiles or packaged files under /etc (by md5 compare)
|
||||||
|
- service-specific “unowned” files under /etc/<hint> trees
|
||||||
|
- filter each candidate through:
|
||||||
|
- user exclude patterns (PathFilter.is_excluded)
|
||||||
|
- IgnorePolicy.deny_reason
|
||||||
|
- readability + regular-file checks
|
||||||
|
- copy accepted files into artifacts/<role>/<src_rel>
|
||||||
|
|
||||||
|
#### Why this class matters:
|
||||||
|
|
||||||
|
It is the core unit of “role inference” for running services.
|
||||||
|
|
||||||
|
______________________
|
||||||
|
|
||||||
|
### PackageSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: captures “manual packages” (from apt-mark showmanual) that weren’t already covered by any service snapshot.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- package: package name
|
||||||
|
- role_name: computed role name (e.g. pkg_postfix)
|
||||||
|
- managed_files, excluded, notes
|
||||||
|
|
||||||
|
#### How it’s computed:
|
||||||
|
|
||||||
|
- list_manual_packages() returns “manually installed”.
|
||||||
|
- Anything already mentioned in any ServiceSnapshot.packages is skipped (recorded in manual_packages_skipped in state.json).
|
||||||
|
- For remaining packages:
|
||||||
|
- detect modified conffiles / modified packaged files under /etc via hashes
|
||||||
|
- capture associated timer overrides if the timer is attributable to that package
|
||||||
|
- scan for custom/unowned files under /etc/<topdir> trees for the package
|
||||||
|
|
||||||
|
______________
|
||||||
|
|
||||||
|
### UsersSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: captures non-system users and safe SSH public artifacts.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- role_name: always "users" in current code
|
||||||
|
- users: list of dicts derived from UserRecord
|
||||||
|
- managed_files: copied ssh public material (as ManagedFile)
|
||||||
|
- excluded: skipped ssh files (as ExcludedFile)
|
||||||
|
- notes: errors (e.g. couldn’t enumerate users)
|
||||||
|
|
||||||
|
__________________
|
||||||
|
|
||||||
|
### AptConfigSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: captures APT configuration and key material.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- role_name: "apt_config"
|
||||||
|
- managed_files, excluded, notes
|
||||||
|
|
||||||
|
#### How it’s populated:
|
||||||
|
|
||||||
|
- Uses _iter_apt_capture_paths() (in harvest.py) to produce specific key APT paths (sources lists, keyrings, etc.).
|
||||||
|
- Each candidate is filtered via PathFilter + IgnorePolicy, then copied.
|
||||||
|
|
||||||
|
__________________
|
||||||
|
|
||||||
|
### EtcCustomSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: “catch-all” role for remaining config-ish files under /etc that weren’t already attributed to a service/package/users/apt.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- role_name: "etc_custom"
|
||||||
|
- managed_files, excluded, notes
|
||||||
|
|
||||||
|
#### How it’s populated:
|
||||||
|
|
||||||
|
- Build a set of “already captured” files from other roles.
|
||||||
|
- Add certain “system essentials” even if package-owned (_iter_system_capture_paths()).
|
||||||
|
- Walk /etc and include unowned files that look “config-ish” (_is_confish()), subject to caps.
|
||||||
|
- Extra logic: if a file is in a shared snippet dir like /etc/cron.d/ or /etc/logrotate.d/, it attempts to re-attach it to an existing role by filename matching (so it doesn’t pollute etc_custom).
|
||||||
|
|
||||||
|
______________
|
||||||
|
|
||||||
|
### UsrLocalCustomSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: captures custom local admin content from /usr/local.
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- role_name: "usr_local_custom"
|
||||||
|
- managed_files, excluded, notes
|
||||||
|
|
||||||
|
#### How it’s populated:
|
||||||
|
|
||||||
|
- Scans /usr/local/etc (collect regular files, subject to IgnorePolicy)
|
||||||
|
- Scans /usr/local/bin but only collects executable files (checks mode has any execute bit)
|
||||||
|
- Caps per scan to avoid explosion.
|
||||||
|
|
||||||
|
_______________
|
||||||
|
|
||||||
|
### ExtraPathsSnapshot (dataclass)
|
||||||
|
|
||||||
|
#### Purpose: captures user-requested extra files from --include-path (and records include/exclude patterns used).
|
||||||
|
|
||||||
|
#### Fields:
|
||||||
|
|
||||||
|
- role_name: "extra_paths"
|
||||||
|
- include_patterns, exclude_patterns: as provided on CLI
|
||||||
|
- managed_files, excluded, notes
|
||||||
|
|
||||||
|
#### How it’s populated:
|
||||||
|
|
||||||
|
- Uses PathFilter.iter_include_patterns() + expand_includes() to turn patterns into concrete file paths.
|
||||||
|
- For each included file not already captured elsewhere:
|
||||||
|
- filter via exclude + IgnorePolicy
|
||||||
|
- copy into artifacts/extra_paths/...
|
||||||
|
- record ManagedFile(reason="user_include")
|
||||||
Loading…
Add table
Add a link
Reference in a new issue