3.6 KiB
cspresso
Crawl up to N pages of a site using a headless Chromium (via Playwright), observe what assets are loaded, and emit a draft Content Security Policy (CSP).
This is meant as a starting point. Review and tighten the resulting policy before enforcing it.
Why "draft"?
- A crawl rarely covers all user flows (auth-only pages, A/B tests, conditional loads, etc.).
- Inline script/style handling is tricky:
- If your pages use nonces, you must generate a new nonce per HTML response and insert it both in the CSP header and in the HTML tags.
- Hashes work only if the inline content is stable byte-for-byte.
Requirements
- Python 3.10+
- Poetry
- Playwright's Chromium browser binaries (auto-installed by this tool if missing)
Install
Poetry
poetry install
pip/pipx
pip install cspresso
AppImage
Download the CSPresso.AppImage from the releases page, make it executable with chmod +x, and run it.
Run
poetry run cspresso https://example.com --max-pages 10
The tool will:
- attempt to launch Chromium headless
- if Chromium isn't installed, it will run:
python -m playwright install chromium - crawl same-origin links up to the page limit
- print the visited URLs and a CSP header
Where Playwright installs browsers
By default, this project installs Playwright browsers into a local folder: ./.pw-browsers.
This makes installs deterministic and easy to cache in CI.
You can override with --browsers-path or by setting PLAYWRIGHT_BROWSERS_PATH yourself.
Linux notes
If Chromium fails to start due to missing system libraries, try:
poetry run cspresso https://example.com --with-deps
That runs python -m playwright install --with-deps chromium (may require sudo depending on your environment).
Output
Default output is a single CSP header line.
For JSON:
poetry run cspresso https://example.com --json
Full usage info
usage: csp-crawl [-h] [--max-pages MAX_PAGES] [--timeout-ms TIMEOUT_MS] [--settle-ms SETTLE_MS] [--headed] [--no-install] [--with-deps] [--browsers-path BROWSERS_PATH] [--allow-blob] [--unsafe-eval]
[--upgrade-insecure-requests] [--include-sourcemaps] [--json]
url
Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.
positional arguments:
url Start URL (e.g. https://example.com)
options:
-h, --help show this help message and exit
--max-pages MAX_PAGES
Maximum number of pages to visit (default: 10)
--timeout-ms TIMEOUT_MS
Navigation timeout in ms (default: 20000)
--settle-ms SETTLE_MS
Extra time after networkidle to allow hydration/delayed requests (default: 1500)
--headed Run with a visible browser window (not headless)
--no-install Do not auto-install Chromium if missing
--with-deps When installing, include Playwright OS deps (Linux). May require elevated privileges.
--browsers-path BROWSERS_PATH
Directory to install/playwright browsers (default: ./.pw-browsers).
--allow-blob Include blob: in common directives (drafty)
--unsafe-eval Include 'unsafe-eval' in script-src (not recommended)
--upgrade-insecure-requests
Add upgrade-insecure-requests directive
--include-sourcemaps Analyze JS/CSS for sourceMappingURL and add map origins to connect-src
--json Output JSON instead of a header line