Add Technical_Decomp_Ignore

Miguel Jacq 2025-12-27 20:48:47 -06:00
parent 3a21e25d27
commit 5fa9cc8339

@ -0,0 +1,59 @@
## enroll/ignore.py
### IgnorePolicy (dataclass)
#### Purpose: the “dont accidentally harvest secrets” gatekeeper.
#### Fields:
- deny_globs: list of fnmatch patterns that are always denied (unless dangerous=True)
- defaults include /etc/shadow, /etc/ssl/private/*, SSH host keys, letsencrypt, etc.
- allow_binary_globs: explicit allowlist of binary-ish config artifacts (APT keyrings etc.)
- max_file_bytes: hard cap; default 256 KB
- sample_bytes: how many bytes to inspect for content heuristics; default 64 KB
- dangerous: if True, relaxes some safety checks
#### Methods:
##### __post_init__
If deny_globs or allow_binary_globs werent passed, it fills them with the defaults.
##### iter_effective_lines(content: bytes)
Yields “meaningful” lines from a bytes blob by skipping:
- empty lines
- line comments starting with #, ;, //, or *
- C-style block comments /* ... */ (best-effort state machine)
This is used so secret scanning doesnt trigger on commented-out examples.
##### deny_reason(path: str) -> Optional[str]
Returns a short deny code if the file should not be harvested; otherwise None.
The decision pipeline is:
- If path.endswith(".log") → "log_file" always denied.
- If not dangerous:
- if path matches any deny glob → "denied_path"
- os.stat() (follow symlinks):
- if stat fails → "unreadable"
- if size > max_file_bytes → "too_large"
- if not a regular file or is symlink → "not_regular_file"
- Read up to sample_bytes:
- if read fails → "unreadable"
- Binary-like detection:
- if the sample contains NUL (b"\x00"):
- if path matches allow-binary globs → allowed
- else → "binary_like"
- Note: this binary check still applies even in dangerous=True.
- If not dangerous:
- scan “effective lines” against regex patterns like:
- PEM private key headers
- password = ...
- keywords (token, secret, api_key, etc.)
- if matched → "sensitive_content"
If nothing triggers, return None (allowed).