Page:
Technical_Decomp_Ignore
Pages
Contact Me
Home
Technical_Decomp_Accounts
Technical_Decomp_Cache
Technical_Decomp_Diff
Technical_Decomp_Harvest
Technical_Decomp_Ignore
Technical_Decomp_JinjaTurtle
Technical_Decomp_Manifest
Technical_Decomp_PathFilter
Technical_Decomp_SopsUtil
Technical_Decomp_Systemd
Troubleshooting
enroll single-shot
enroll diff
enroll harvest
enroll manifest
No results
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
enroll/ignore.py
IgnorePolicy (dataclass)
Purpose: the “don’t accidentally harvest secrets” gatekeeper.
Fields:
- deny_globs: list of fnmatch patterns that are always denied (unless dangerous=True)
- defaults include /etc/shadow, /etc/ssl/private/*, SSH host keys, letsencrypt, etc.
- allow_binary_globs: explicit allowlist of binary-ish config artifacts (APT keyrings etc.)
- max_file_bytes: hard cap; default 256 KB
- sample_bytes: how many bytes to inspect for content heuristics; default 64 KB
- dangerous: if True, relaxes some safety checks
Methods:
post_init
If deny_globs or allow_binary_globs weren’t passed, it fills them with the defaults.
iter_effective_lines(content: bytes)
Yields “meaningful” lines from a bytes blob by skipping:
- empty lines
- line comments starting with #, ;, //, or *
- C-style block comments /* ... */ (best-effort state machine)
This is used so secret scanning doesn’t trigger on commented-out examples.
deny_reason(path: str) -> Optional[str]
Returns a short deny code if the file should not be harvested; otherwise None.
The decision pipeline is:
- If path.endswith(".log") → "log_file" always denied.
- If not dangerous:
- if path matches any deny glob → "denied_path"
- os.stat() (follow symlinks):
- if stat fails → "unreadable"
- if size > max_file_bytes → "too_large"
- if not a regular file or is symlink → "not_regular_file"
- Read up to sample_bytes:
- if read fails → "unreadable"
- Binary-like detection:
- if the sample contains NUL (b"\x00"):
- if path matches allow-binary globs → allowed
- else → "binary_like"
- Note: this binary check still applies even in dangerous=True.
- if the sample contains NUL (b"\x00"):
- If not dangerous:
- scan “effective lines” against regex patterns like:
- PEM private key headers
- password = ...
- keywords (token, secret, api_key, etc.)
- if matched → "sensitive_content"
- scan “effective lines” against regex patterns like:
If nothing triggers, return None (allowed).