Technical_Decomp_Ignore

Table of Contents

enroll/ignore.py

IgnorePolicy (dataclass)

Purpose: the “don’t accidentally harvest secrets” gatekeeper.
Fields:
Methods:

post_init
iter_effective_lines(content: bytes)
deny_reason(path: str) -> Optional[str]

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

enroll/ignore.py

IgnorePolicy (dataclass)

Purpose: the “don’t accidentally harvest secrets” gatekeeper.

Fields:

deny_globs: list of fnmatch patterns that are always denied (unless dangerous=True)
defaults include /etc/shadow, /etc/ssl/private/*, SSH host keys, letsencrypt, etc.
allow_binary_globs: explicit allowlist of binary-ish config artifacts (APT keyrings etc.)
max_file_bytes: hard cap; default 256 KB
sample_bytes: how many bytes to inspect for content heuristics; default 64 KB
dangerous: if True, relaxes some safety checks

Methods:

post_init

If deny_globs or allow_binary_globs weren’t passed, it fills them with the defaults.

iter_effective_lines(content: bytes)

Yields “meaningful” lines from a bytes blob by skipping:

empty lines
line comments starting with #, ;, //, or *
C-style block comments /* ... */ (best-effort state machine)

This is used so secret scanning doesn’t trigger on commented-out examples.

deny_reason(path: str) -> Optional[str]

Returns a short deny code if the file should not be harvested; otherwise None.

The decision pipeline is:

If path.endswith(".log") → "log_file" always denied.
If not dangerous:
- if path matches any deny glob → "denied_path"
os.stat() (follow symlinks):
- if stat fails → "unreadable"
- if size > max_file_bytes → "too_large"
- if not a regular file or is symlink → "not_regular_file"
Read up to sample_bytes:
- if read fails → "unreadable"
Binary-like detection:
- if the sample contains NUL (b"\x00"):
  - if path matches allow-binary globs → allowed
  - else → "binary_like"
- Note: this binary check still applies even in dangerous=True.
If not dangerous:
- scan “effective lines” against regex patterns like:
  - PEM private key headers
  - password = ...
  - keywords (token, secret, api_key, etc.)
- if matched → "sensitive_content"

If nothing triggers, return None (allowed).