1 Technical_Decomp_Ignore
Miguel Jacq edited this page 2025-12-27 20:48:47 -06:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

enroll/ignore.py

IgnorePolicy (dataclass)

Purpose: the “dont accidentally harvest secrets” gatekeeper.

Fields:

  • deny_globs: list of fnmatch patterns that are always denied (unless dangerous=True)
  • defaults include /etc/shadow, /etc/ssl/private/*, SSH host keys, letsencrypt, etc.
  • allow_binary_globs: explicit allowlist of binary-ish config artifacts (APT keyrings etc.)
  • max_file_bytes: hard cap; default 256 KB
  • sample_bytes: how many bytes to inspect for content heuristics; default 64 KB
  • dangerous: if True, relaxes some safety checks

Methods:

post_init

If deny_globs or allow_binary_globs werent passed, it fills them with the defaults.

iter_effective_lines(content: bytes)

Yields “meaningful” lines from a bytes blob by skipping:

  • empty lines
  • line comments starting with #, ;, //, or *
  • C-style block comments /* ... */ (best-effort state machine)

This is used so secret scanning doesnt trigger on commented-out examples.

deny_reason(path: str) -> Optional[str]

Returns a short deny code if the file should not be harvested; otherwise None.

The decision pipeline is:

  • If path.endswith(".log") → "log_file" always denied.
  • If not dangerous:
    • if path matches any deny glob → "denied_path"
  • os.stat() (follow symlinks):
    • if stat fails → "unreadable"
    • if size > max_file_bytes → "too_large"
    • if not a regular file or is symlink → "not_regular_file"
  • Read up to sample_bytes:
    • if read fails → "unreadable"
  • Binary-like detection:
    • if the sample contains NUL (b"\x00"):
      • if path matches allow-binary globs → allowed
      • else → "binary_like"
    • Note: this binary check still applies even in dangerous=True.
  • If not dangerous:
    • scan “effective lines” against regex patterns like:
      • PEM private key headers
      • password = ...
      • keywords (token, secret, api_key, etc.)
    • if matched → "sensitive_content"

If nothing triggers, return None (allowed).