cspresso/README.md
Miguel Jacq bfa16a145a
All checks were successful
CI / test (push) Successful in 2m48s
Lint / test (push) Successful in 31s
Trivy / test (push) Successful in 23s
Add --ignore-non-html option to skip pages that weren't HTML (which might trigger Chromium's 'sha256-4Su6mBWzEIFnH4pAGMOuaeBrstwJN4Z3pq/s1Kn4/KQ=' hash)
2026-01-02 10:41:57 +11:00

3.8 KiB

cspresso

CSPresso logo

Crawl up to N pages of a site using a headless Chromium (via Playwright), observe what assets are loaded, and emit a draft Content Security Policy (CSP).

This is meant as a starting point. Review and tighten the resulting policy before enforcing it.

Why "draft"?

  • A crawl rarely covers all user flows (auth-only pages, A/B tests, conditional loads, etc.).
  • Inline script/style handling is tricky:
    • If your pages use nonces, you must generate a new nonce per HTML response and insert it both in the CSP header and in the HTML tags.
    • Hashes work only if the inline content is stable byte-for-byte.

Requirements

  • Python 3.10+
  • Poetry
  • Playwright's Chromium browser binaries (auto-installed by this tool if missing)

Install

Poetry

poetry install

pip/pipx

pip install cspresso

AppImage

Download the CSPresso.AppImage from the releases page, make it executable with chmod +x, and run it.

Run

poetry run cspresso https://example.com --max-pages 10

The tool will:

  1. attempt to launch Chromium headless
  2. if Chromium isn't installed, it will run: python -m playwright install chromium
  3. crawl same-origin links up to the page limit
  4. print the visited URLs and a CSP header

Where Playwright installs browsers

By default, this project installs Playwright browsers into a local folder: ./.pw-browsers. This makes installs deterministic and easy to cache in CI.

You can override with --browsers-path or by setting PLAYWRIGHT_BROWSERS_PATH yourself.

Linux notes

If Chromium fails to start due to missing system libraries, try:

poetry run cspresso https://example.com --with-deps

That runs python -m playwright install --with-deps chromium (may require sudo depending on your environment).

Output

Default output is a single CSP header line.

For JSON:

poetry run cspresso https://example.com --json

Full usage info

usage: cspresso [-h] [--max-pages MAX_PAGES] [--timeout-ms TIMEOUT_MS] [--settle-ms SETTLE_MS] [--headed] [--no-install] [--with-deps] [--browsers-path BROWSERS_PATH] [--allow-blob] [--unsafe-eval]
                [--upgrade-insecure-requests] [--include-sourcemaps] [--ignore-non-html] [--json]
                url

Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.

positional arguments:
  url                   Start URL (e.g. https://example.com)

options:
  -h, --help            show this help message and exit
  --max-pages MAX_PAGES
                        Maximum number of pages to visit (default: 10)
  --timeout-ms TIMEOUT_MS
                        Navigation timeout in ms (default: 20000)
  --settle-ms SETTLE_MS
                        Extra time after networkidle to allow hydration/delayed requests (default: 1500)
  --headed              Run with a visible browser window (not headless)
  --no-install          Do not auto-install Chromium if missing
  --with-deps           When installing, include Playwright OS deps (Linux). May require elevated privileges.
  --browsers-path BROWSERS_PATH
                        Directory to install/playwright browsers (default: ./.pw-browsers).
  --allow-blob          Include blob: in common directives (drafty)
  --unsafe-eval         Include 'unsafe-eval' in script-src (not recommended)
  --upgrade-insecure-requests
                        Add upgrade-insecure-requests directive
  --include-sourcemaps  Analyze JS/CSS for sourceMappingURL and add map origins to connect-src
  --ignore-non-html     Ignore non-HTML pages that get crawled (which might trigger Chromium's word-wrap hash: https://stackoverflow.com/a/69838710)
  --json                Output JSON instead of a header line