Add Technical_Decomp_Ignore
parent
3a21e25d27
commit
5fa9cc8339
1 changed files with 59 additions and 0 deletions
59
Technical_Decomp_Ignore.md
Normal file
59
Technical_Decomp_Ignore.md
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
## enroll/ignore.py
|
||||
|
||||
### IgnorePolicy (dataclass)
|
||||
|
||||
#### Purpose: the “don’t accidentally harvest secrets” gatekeeper.
|
||||
|
||||
#### Fields:
|
||||
|
||||
- deny_globs: list of fnmatch patterns that are always denied (unless dangerous=True)
|
||||
- defaults include /etc/shadow, /etc/ssl/private/*, SSH host keys, letsencrypt, etc.
|
||||
- allow_binary_globs: explicit allowlist of binary-ish config artifacts (APT keyrings etc.)
|
||||
- max_file_bytes: hard cap; default 256 KB
|
||||
- sample_bytes: how many bytes to inspect for content heuristics; default 64 KB
|
||||
- dangerous: if True, relaxes some safety checks
|
||||
|
||||
#### Methods:
|
||||
|
||||
##### __post_init__
|
||||
|
||||
If deny_globs or allow_binary_globs weren’t passed, it fills them with the defaults.
|
||||
|
||||
##### iter_effective_lines(content: bytes)
|
||||
|
||||
Yields “meaningful” lines from a bytes blob by skipping:
|
||||
|
||||
- empty lines
|
||||
- line comments starting with #, ;, //, or *
|
||||
- C-style block comments /* ... */ (best-effort state machine)
|
||||
|
||||
This is used so secret scanning doesn’t trigger on commented-out examples.
|
||||
|
||||
##### deny_reason(path: str) -> Optional[str]
|
||||
|
||||
Returns a short deny code if the file should not be harvested; otherwise None.
|
||||
|
||||
The decision pipeline is:
|
||||
|
||||
- If path.endswith(".log") → "log_file" always denied.
|
||||
- If not dangerous:
|
||||
- if path matches any deny glob → "denied_path"
|
||||
- os.stat() (follow symlinks):
|
||||
- if stat fails → "unreadable"
|
||||
- if size > max_file_bytes → "too_large"
|
||||
- if not a regular file or is symlink → "not_regular_file"
|
||||
- Read up to sample_bytes:
|
||||
- if read fails → "unreadable"
|
||||
- Binary-like detection:
|
||||
- if the sample contains NUL (b"\x00"):
|
||||
- if path matches allow-binary globs → allowed
|
||||
- else → "binary_like"
|
||||
- Note: this binary check still applies even in dangerous=True.
|
||||
- If not dangerous:
|
||||
- scan “effective lines” against regex patterns like:
|
||||
- PEM private key headers
|
||||
- password = ...
|
||||
- keywords (token, secret, api_key, etc.)
|
||||
- if matched → "sensitive_content"
|
||||
|
||||
If nothing triggers, return None (allowed).
|
||||
Loading…
Add table
Add a link
Reference in a new issue