A CLI tool to crawl a website and automatically generate a Content Security Policy (CSP) for it. https://cspresso.cafe

Find a file

Miguel Jacq 16cd1e4b40 All checks were successful CI / test (push) Successful in 2m26s Details Lint / test (push) Successful in 30s Details Trivy / test (push) Successful in 23s Details Update README		2026-01-02 11:04:39 +11:00
.forgejo/workflows	Initial commit	2026-01-02 09:59:52 +11:00
src/cspresso	Update README	2026-01-02 11:04:39 +11:00
tests	black	2026-01-02 10:03:39 +11:00
.gitignore	Initial commit	2026-01-02 09:59:52 +11:00
.pre-commit-config.yaml	Initial commit	2026-01-02 09:59:52 +11:00
CHANGELOG.md	Fix detection of Python for AppImage if it needs to install browsers via playwright	2026-01-02 10:50:53 +11:00
cspresso.svg	Initial commit	2026-01-02 09:59:52 +11:00
LICENSE	Initial commit	2026-01-02 09:59:52 +11:00
poetry.lock	Initial commit	2026-01-02 09:59:52 +11:00
pyproject.toml	0.1.1	2026-01-02 10:56:05 +11:00
README.md	Update README	2026-01-02 11:04:39 +11:00
release.sh	Initial commit	2026-01-02 09:59:52 +11:00
tests.sh	with deps?	2026-01-02 10:03:15 +11:00

README.md

cspresso

Crawl up to N pages of a site using a headless Chromium (via Playwright), observe what assets are loaded, and emit a draft Content Security Policy (CSP).

This is meant as a starting point. Review and tighten the resulting policy before enforcing it.

Why "draft"?

A crawl rarely covers all user flows (auth-only pages, A/B tests, conditional loads, etc.).
Inline script/style handling is tricky:
- If your pages use nonces, you must generate a new nonce per HTML response and insert it both in the CSP header and in the HTML tags.
- Hashes work only if the inline content is stable byte-for-byte.

Requirements

Python 3.10+
Playwright's Chromium browser binaries (auto-installed by this tool if missing)

Install

If using my artifacts from the Releases page, you may wish to verify the GPG signatures with the key.

It can be found at https://mig5.net/static/mig5.asc . The fingerprint is 00AE817C24A10C2540461A9C1D7CDE0234DB458D.

Poetry

poetry install

pip/pipx

pip install cspresso

AppImage

Download the CSPresso.AppImage from the releases page, make it executable with chmod +x, and run it.

Run

cspresso https://example.com --max-pages 10

The tool will:

attempt to launch Chromium headless
if Chromium isn't installed, it will run: python -m playwright install chromium
crawl same-origin links up to the page limit
print the visited URLs and a CSP header

Where Playwright installs browsers

By default, this project installs Playwright browsers into a local folder: ./.pw-browsers. This makes installs deterministic and easy to cache in CI.

You can override with --browsers-path or by setting PLAYWRIGHT_BROWSERS_PATH yourself.

Linux notes

If Chromium fails to start due to missing system libraries, try:

poetry run cspresso https://example.com --with-deps

That runs python -m playwright install --with-deps chromium (may require sudo depending on your environment).

Output

Default output is a single CSP header line.

For JSON:

poetry run cspresso https://example.com --json

Full usage info

usage: cspresso [-h] [--max-pages MAX_PAGES] [--timeout-ms TIMEOUT_MS] [--settle-ms SETTLE_MS] [--headed] [--no-install] [--with-deps] [--browsers-path BROWSERS_PATH] [--allow-blob] [--unsafe-eval]
                [--upgrade-insecure-requests] [--include-sourcemaps] [--ignore-non-html] [--json]
                url

Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.

positional arguments:
  url                   Start URL (e.g. https://example.com)

options:
  -h, --help            show this help message and exit
  --max-pages MAX_PAGES
                        Maximum number of pages to visit (default: 10)
  --timeout-ms TIMEOUT_MS
                        Navigation timeout in ms (default: 20000)
  --settle-ms SETTLE_MS
                        Extra time after networkidle to allow hydration/delayed requests (default: 1500)
  --headed              Run with a visible browser window (not headless)
  --no-install          Do not auto-install Chromium if missing
  --with-deps           When installing, include Playwright OS deps (Linux). May require elevated privileges.
  --browsers-path BROWSERS_PATH
                        Directory to install/playwright browsers (default: ./.pw-browsers).
  --allow-blob          Include blob: in common directives (drafty)
  --unsafe-eval         Include 'unsafe-eval' in script-src (not recommended)
  --upgrade-insecure-requests
                        Add upgrade-insecure-requests directive
  --include-sourcemaps  Analyze JS/CSS for sourceMappingURL and add map origins to connect-src
  --ignore-non-html     Ignore non-HTML pages that get crawled (which might trigger Chromium's word-wrap hash: https://stackoverflow.com/a/69838710)
  --json                Output JSON instead of a header line