Compare commits
No commits in common. "main" and "0.1.0" have entirely different histories.
6 changed files with 34 additions and 400 deletions
14
CHANGELOG.md
14
CHANGELOG.md
|
|
@ -1,14 +0,0 @@
|
||||||
## 0.1.2
|
|
||||||
|
|
||||||
* Add `--bypass-csp` option to ignore an existing enforcing CSP to avoid it skewing results
|
|
||||||
* Add `--evaluate` option to test a proposed CSP without needing to install it (best to use in conjunction with --bypass-csp`)
|
|
||||||
|
|
||||||
## 0.1.1
|
|
||||||
|
|
||||||
* Fix prog name
|
|
||||||
* Add --ignore-non-html option to skip pages that weren't HTML (which might trigger Chromium's 'sha256-4Su6mBWzEIFnH4pAGMOuaeBrstwJN4Z3pq/s1Kn4/KQ=' hash)
|
|
||||||
* Fix detection of Python for AppImage if it needs to install browsers via playwright
|
|
||||||
|
|
||||||
## 0.1.0
|
|
||||||
|
|
||||||
* Initial release
|
|
||||||
78
README.md
78
README.md
|
|
@ -18,14 +18,11 @@ This is meant as a **starting point**. Review and tighten the resulting policy b
|
||||||
## Requirements
|
## Requirements
|
||||||
|
|
||||||
- Python 3.10+
|
- Python 3.10+
|
||||||
|
- Poetry
|
||||||
- Playwright's Chromium browser binaries (auto-installed by this tool if missing)
|
- Playwright's Chromium browser binaries (auto-installed by this tool if missing)
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
If using my artifacts from the Releases page, you may wish to verify the GPG signatures with the key.
|
|
||||||
|
|
||||||
It can be found at https://mig5.net/static/mig5.asc . The fingerprint is `00AE817C24A10C2540461A9C1D7CDE0234DB458D`.
|
|
||||||
|
|
||||||
### Poetry
|
### Poetry
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -45,7 +42,7 @@ Download the CSPresso.AppImage from the releases page, make it executable with `
|
||||||
## Run
|
## Run
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cspresso https://example.com --max-pages 10
|
poetry run cspresso https://example.com --max-pages 10
|
||||||
```
|
```
|
||||||
|
|
||||||
The tool will:
|
The tool will:
|
||||||
|
|
@ -54,15 +51,6 @@ The tool will:
|
||||||
3) crawl same-origin links up to the page limit
|
3) crawl same-origin links up to the page limit
|
||||||
4) print the visited URLs and a CSP header
|
4) print the visited URLs and a CSP header
|
||||||
|
|
||||||
### Avoiding an existing enforcing CSP header during analysis
|
|
||||||
|
|
||||||
**NOTE**: If you have an existing CSP header in place on your site, this could negatively influence
|
|
||||||
`cspresso`'s ability to evaluate what's on the page. Consider adding `--bypass-csp` to ignore the
|
|
||||||
current CSP (noting that if your site is compromised, doing so could put your machine at risk if
|
|
||||||
it evaluates malicious javascript/css etc).
|
|
||||||
|
|
||||||
See also the `--evaluate` option below.
|
|
||||||
|
|
||||||
## Where Playwright installs browsers
|
## Where Playwright installs browsers
|
||||||
|
|
||||||
By default, this project installs Playwright browsers into a local folder: `./.pw-browsers`.
|
By default, this project installs Playwright browsers into a local folder: `./.pw-browsers`.
|
||||||
|
|
@ -75,7 +63,7 @@ You can override with `--browsers-path` or by setting `PLAYWRIGHT_BROWSERS_PATH`
|
||||||
If Chromium fails to start due to missing system libraries, try:
|
If Chromium fails to start due to missing system libraries, try:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cspresso https://example.com --with-deps
|
poetry run cspresso https://example.com --with-deps
|
||||||
```
|
```
|
||||||
|
|
||||||
That runs `python -m playwright install --with-deps chromium` (may require sudo depending on your environment).
|
That runs `python -m playwright install --with-deps chromium` (may require sudo depending on your environment).
|
||||||
|
|
@ -87,65 +75,14 @@ Default output is a single CSP header line.
|
||||||
For JSON:
|
For JSON:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cspresso https://example.com --json
|
poetry run cspresso https://example.com --json
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Evaluate a proposed CSP without installing it
|
|
||||||
|
|
||||||
You can use `cspresso` to evaluate a *proposed* CSP against a site. When you do this, cspresso converts
|
|
||||||
the response from the website to implant `Content-Security-Policy-Report-Only` headers using the CSP
|
|
||||||
you supplied to `--evaluate`. If it detects any violations, it will report them and exit with code 1,
|
|
||||||
which may be useful for CSP.
|
|
||||||
|
|
||||||
**NOTE**: It is highly recommended to use `--bypass-csp` in addition to `--evaluate`, so that your
|
|
||||||
results are not influenced by any existing CSP's enforcement.
|
|
||||||
|
|
||||||
**Example:**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
❯ poetry run cspresso https://mig5.net --evaluate "default-src 'none'" --bypass-csp --json
|
|
||||||
{
|
|
||||||
"csp": "base-uri 'self'; default-src 'self'; form-action 'self'; frame-ancestors 'self'; object-src 'none'; style-src 'self' 'sha256-4Su6mBWzEIFnH4pAGMOuaeBrstwJN4Z3pq/s1Kn4/KQ=' 'unsafe-hashes'; style-src-attr 'sha256-4Su6mBWzEIFnH4pAGMOuaeBrstwJN4Z3pq/s1Kn4/KQ=' 'unsafe-hashes';",
|
|
||||||
"directives": {},
|
|
||||||
"evaluated_policy": "default-src 'none'",
|
|
||||||
"nonce_detected": false,
|
|
||||||
"notes": [
|
|
||||||
"Detected inline attribute code (style=\"...\" and/or on*=\"...\"). Hashes for these require 'unsafe-hashes' (and modern browsers may use style-src-attr/script-src-attr)."
|
|
||||||
],
|
|
||||||
"violations": [
|
|
||||||
{
|
|
||||||
"console": true,
|
|
||||||
"disposition": "report",
|
|
||||||
"documentURI": "https://mig5.net/",
|
|
||||||
"text": "Loading the stylesheet 'https://mig5.net/style.css' violates the following Content Security Policy directive: \"default-src 'none'\". Note that 'style-src-elem' was not explicitly set, so 'default-src' is used as a fallback. The policy is report-only, so the violation has been logged but no further action has been taken.",
|
|
||||||
"type": "info"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"console": true,
|
|
||||||
"disposition": "report",
|
|
||||||
"documentURI": "https://mig5.net/static/mig5.asc",
|
|
||||||
"text": "Applying inline style violates the following Content Security Policy directive 'default-src 'none''. Either the 'unsafe-inline' keyword, a hash ('sha256-4Su6mBWzEIFnH4pAGMOuaeBrstwJN4Z3pq/s1Kn4/KQ='), or a nonce ('nonce-...') is required to enable inline execution. Note that hashes do not apply to event handlers, style attributes and javascript: navigations unless the 'unsafe-hashes' keyword is present. Note also that 'style-src' was not explicitly set, so 'default-src' is used as a fallback. The policy is report-only, so the violation has been logged but no further action has been taken.",
|
|
||||||
"type": "info"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"visited": [
|
|
||||||
"https://mig5.net",
|
|
||||||
"https://mig5.net/",
|
|
||||||
"https://mig5.net/static/mig5.asc"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
cspresso on main [!] via 🐍 v3.13.5 took 18s
|
|
||||||
❯ echo $?
|
|
||||||
1
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Full usage info
|
## Full usage info
|
||||||
|
|
||||||
```
|
```
|
||||||
usage: cspresso [-h] [--max-pages MAX_PAGES] [--timeout-ms TIMEOUT_MS] [--settle-ms SETTLE_MS] [--headed] [--no-install] [--with-deps] [--browsers-path BROWSERS_PATH] [--allow-blob] [--unsafe-eval]
|
usage: csp-crawl [-h] [--max-pages MAX_PAGES] [--timeout-ms TIMEOUT_MS] [--settle-ms SETTLE_MS] [--headed] [--no-install] [--with-deps] [--browsers-path BROWSERS_PATH] [--allow-blob] [--unsafe-eval]
|
||||||
[--upgrade-insecure-requests] [--include-sourcemaps] [--bypass-csp] [--evaluate CSP] [--ignore-non-html] [--json]
|
[--upgrade-insecure-requests] [--include-sourcemaps] [--json]
|
||||||
url
|
url
|
||||||
|
|
||||||
Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.
|
Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.
|
||||||
|
|
@ -171,8 +108,5 @@ options:
|
||||||
--upgrade-insecure-requests
|
--upgrade-insecure-requests
|
||||||
Add upgrade-insecure-requests directive
|
Add upgrade-insecure-requests directive
|
||||||
--include-sourcemaps Analyze JS/CSS for sourceMappingURL and add map origins to connect-src
|
--include-sourcemaps Analyze JS/CSS for sourceMappingURL and add map origins to connect-src
|
||||||
--bypass-csp Strip any existing CSP/CSP-Report-Only response headers from HTML documents (useful for discovery or evaluation).
|
|
||||||
--evaluate CSP Inject the provided CSP string as Content-Security-Policy-Report-Only on HTML documents and exit 1 if any Report-Only violations are detected. Quote the value.
|
|
||||||
--ignore-non-html Ignore non-HTML pages that get crawled (which might trigger Chromium's word-wrap hash: https://stackoverflow.com/a/69838710)
|
|
||||||
--json Output JSON instead of a header line
|
--json Output JSON instead of a header line
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -1,12 +1,11 @@
|
||||||
[tool.poetry]
|
[tool.poetry]
|
||||||
name = "cspresso"
|
name = "cspresso"
|
||||||
version = "0.1.2"
|
version = "0.1.0"
|
||||||
description = "Crawl a website with a headless browser and generate a draft Content-Security-Policy (CSP)."
|
description = "Crawl a website with a headless browser and generate a draft Content-Security-Policy (CSP)."
|
||||||
authors = ["Miguel Jacq <mig@mig5.net>"]
|
authors = ["Miguel Jacq <mig@mig5.net>"]
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
packages = [{ include = "cspresso", from = "src" }]
|
packages = [{ include = "cspresso", from = "src" }]
|
||||||
license = "GPL-3.0-or-later"
|
license = "GPL-3.0-or-later"
|
||||||
homepage = "https://cspresso.cafe"
|
|
||||||
repository = "https://git.mig5.net/mig5/cspresso"
|
repository = "https://git.mig5.net/mig5/cspresso"
|
||||||
|
|
||||||
[tool.poetry.dependencies]
|
[tool.poetry.dependencies]
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,4 @@
|
||||||
import sys
|
|
||||||
from .crawl import main
|
from .crawl import main
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
sys.exit(main())
|
main()
|
||||||
|
|
|
||||||
|
|
@ -48,13 +48,6 @@ def sha256_base64(s: str) -> str:
|
||||||
return base64.b64encode(h).decode("ascii")
|
return base64.b64encode(h).decode("ascii")
|
||||||
|
|
||||||
|
|
||||||
def normalize_csp_string(csp: str) -> str:
|
|
||||||
s = (csp or "").strip()
|
|
||||||
if not s:
|
|
||||||
return s
|
|
||||||
return s if s.endswith(";") else s + ";"
|
|
||||||
|
|
||||||
|
|
||||||
async def collect_inline(page, *, max_attr_hashes: int = 2000):
|
async def collect_inline(page, *, max_attr_hashes: int = 2000):
|
||||||
"""
|
"""
|
||||||
Collect inline <script> (no src), <style> blocks, plus:
|
Collect inline <script> (no src), <style> blocks, plus:
|
||||||
|
|
@ -298,7 +291,6 @@ class CrawlResult:
|
||||||
nonce_detected: bool
|
nonce_detected: bool
|
||||||
directives: dict[str, list[str]]
|
directives: dict[str, list[str]]
|
||||||
notes: list[str]
|
notes: list[str]
|
||||||
violations: list[dict]
|
|
||||||
|
|
||||||
|
|
||||||
async def crawl_and_generate_csp(
|
async def crawl_and_generate_csp(
|
||||||
|
|
@ -315,9 +307,6 @@ async def crawl_and_generate_csp(
|
||||||
allow_unsafe_eval: bool = False,
|
allow_unsafe_eval: bool = False,
|
||||||
upgrade_insecure_requests: bool = False,
|
upgrade_insecure_requests: bool = False,
|
||||||
include_sourcemaps: bool = False,
|
include_sourcemaps: bool = False,
|
||||||
ignore_non_html: bool = False,
|
|
||||||
bypass_csp: bool = False,
|
|
||||||
evaluate: str | None = None, # CSP string to inject as Report-Only and evaluate
|
|
||||||
) -> CrawlResult:
|
) -> CrawlResult:
|
||||||
start_url, _ = urldefrag(start_url)
|
start_url, _ = urldefrag(start_url)
|
||||||
base_origin = origin_of(start_url)
|
base_origin = origin_of(start_url)
|
||||||
|
|
@ -345,48 +334,10 @@ async def crawl_and_generate_csp(
|
||||||
allow_data_font = False
|
allow_data_font = False
|
||||||
notes: list[str] = []
|
notes: list[str] = []
|
||||||
|
|
||||||
evaluate_policy = normalize_csp_string(evaluate) if evaluate else None
|
|
||||||
# Captured CSP violations (Report-Only) when --evaluate is used.
|
|
||||||
violations: list[dict] = []
|
|
||||||
|
|
||||||
async with async_playwright() as p:
|
async with async_playwright() as p:
|
||||||
browser = await p.chromium.launch(headless=headless)
|
browser = await p.chromium.launch(headless=headless)
|
||||||
context = await browser.new_context()
|
context = await browser.new_context()
|
||||||
|
|
||||||
# Optionally strip any existing CSP headers, and/or inject a Report-Only CSP for evaluation.
|
|
||||||
# NOTE: This operates on *document response headers* only.
|
|
||||||
if bypass_csp or evaluate_policy:
|
|
||||||
|
|
||||||
async def _route_handler(route, request):
|
|
||||||
try:
|
|
||||||
if request.resource_type != "document":
|
|
||||||
return await route.continue_()
|
|
||||||
|
|
||||||
resp = await route.fetch()
|
|
||||||
hdrs = {k.lower(): v for k, v in (resp.headers or {}).items()}
|
|
||||||
|
|
||||||
if bypass_csp:
|
|
||||||
hdrs.pop("content-security-policy", None)
|
|
||||||
hdrs.pop("content-security-policy-report-only", None)
|
|
||||||
|
|
||||||
if evaluate_policy:
|
|
||||||
hdrs["content-security-policy-report-only"] = evaluate_policy
|
|
||||||
|
|
||||||
try:
|
|
||||||
return await route.fulfill(response=resp, headers=hdrs)
|
|
||||||
except TypeError:
|
|
||||||
body = await resp.body()
|
|
||||||
return await route.fulfill(
|
|
||||||
status=resp.status, headers=hdrs, body=body
|
|
||||||
)
|
|
||||||
except Exception:
|
|
||||||
try:
|
|
||||||
return await route.continue_()
|
|
||||||
except Exception:
|
|
||||||
return
|
|
||||||
|
|
||||||
await context.route("**/*", _route_handler)
|
|
||||||
|
|
||||||
def on_request(req):
|
def on_request(req):
|
||||||
"""
|
"""
|
||||||
Playwright sometimes classifies "connect-like" activity as resource_type == "other".
|
Playwright sometimes classifies "connect-like" activity as resource_type == "other".
|
||||||
|
|
@ -428,59 +379,6 @@ async def crawl_and_generate_csp(
|
||||||
|
|
||||||
page = await context.new_page()
|
page = await context.new_page()
|
||||||
|
|
||||||
# If evaluating a candidate CSP, capture Report-Only violations.
|
|
||||||
if evaluate_policy:
|
|
||||||
|
|
||||||
def _record_violation(_source, payload):
|
|
||||||
try:
|
|
||||||
if (
|
|
||||||
isinstance(payload, dict)
|
|
||||||
and payload.get("disposition") == "report"
|
|
||||||
):
|
|
||||||
violations.append(payload)
|
|
||||||
except Exception:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
await page.expose_binding("__cspresso_violation", _record_violation)
|
|
||||||
await page.add_init_script(
|
|
||||||
"() => { try { window.addEventListener('securitypolicyviolation', (e) => { "
|
|
||||||
"const payload = {documentURI:e.documentURI, referrer:e.referrer, blockedURI:e.blockedURI, "
|
|
||||||
"violatedDirective:e.violatedDirective, effectiveDirective:e.effectiveDirective, originalPolicy:e.originalPolicy, "
|
|
||||||
"disposition:e.disposition, sourceFile:e.sourceFile, lineNumber:e.lineNumber, columnNumber:e.columnNumber, "
|
|
||||||
"statusCode:e.statusCode, sample:e.sample}; "
|
|
||||||
"if (typeof window.__cspresso_violation === 'function') { window.__cspresso_violation(payload); }"
|
|
||||||
"}, true); } catch(_){} }"
|
|
||||||
)
|
|
||||||
except Exception:
|
|
||||||
pass # nosec
|
|
||||||
|
|
||||||
def _on_console(msg):
|
|
||||||
try:
|
|
||||||
t = msg.text or ""
|
|
||||||
tl = t.lower()
|
|
||||||
if (
|
|
||||||
"content security policy" in tl
|
|
||||||
or "content-security-policy" in tl
|
|
||||||
) and (
|
|
||||||
"would violate" in tl
|
|
||||||
or "report-only" in tl
|
|
||||||
or "report only" in tl
|
|
||||||
):
|
|
||||||
violations.append(
|
|
||||||
{
|
|
||||||
"console": True,
|
|
||||||
"type": msg.type,
|
|
||||||
"text": t,
|
|
||||||
"documentURI": page.url,
|
|
||||||
"disposition": "report",
|
|
||||||
}
|
|
||||||
)
|
|
||||||
except Exception:
|
|
||||||
return
|
|
||||||
|
|
||||||
page.on("console", _on_console)
|
|
||||||
|
|
||||||
pending: set[asyncio.Task] = set()
|
pending: set[asyncio.Task] = set()
|
||||||
|
|
||||||
if include_sourcemaps:
|
if include_sourcemaps:
|
||||||
|
|
@ -504,6 +402,7 @@ async def crawl_and_generate_csp(
|
||||||
directives.setdefault("connect-src", set()).add(o)
|
directives.setdefault("connect-src", set()).add(o)
|
||||||
|
|
||||||
except Exception:
|
except Exception:
|
||||||
|
# If you want to debug failures, print(traceback.format_exc())
|
||||||
return
|
return
|
||||||
|
|
||||||
def on_response(resp):
|
def on_response(resp):
|
||||||
|
|
@ -514,18 +413,7 @@ async def crawl_and_generate_csp(
|
||||||
page.on("response", on_response)
|
page.on("response", on_response)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
resp = await page.goto(
|
await page.goto(url, wait_until="networkidle", timeout=timeout_ms)
|
||||||
url, wait_until="networkidle", timeout=timeout_ms
|
|
||||||
)
|
|
||||||
|
|
||||||
ct = ""
|
|
||||||
if resp is not None:
|
|
||||||
ct = (await resp.header_value("content-type") or "").lower()
|
|
||||||
|
|
||||||
is_html = ("text/html" in ct) or ("application/xhtml+xml" in ct)
|
|
||||||
if not is_html and ignore_non_html:
|
|
||||||
# Still count as visited, but don't hash inline attrs / don't extract links.
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Give the page a moment to run hydration / delayed fetches.
|
# Give the page a moment to run hydration / delayed fetches.
|
||||||
if settle_ms > 0:
|
if settle_ms > 0:
|
||||||
|
|
@ -600,41 +488,18 @@ async def crawl_and_generate_csp(
|
||||||
)
|
)
|
||||||
|
|
||||||
directives_out = {k: sorted(v) for k, v in directives.items() if v}
|
directives_out = {k: sorted(v) for k, v in directives.items() if v}
|
||||||
|
|
||||||
# De-duplicate violations (same doc+directive+blocked URI) to keep output stable.
|
|
||||||
if violations:
|
|
||||||
seen = set()
|
|
||||||
uniq: list[dict] = []
|
|
||||||
for v in violations:
|
|
||||||
if not isinstance(v, dict):
|
|
||||||
continue
|
|
||||||
key = (
|
|
||||||
v.get("documentURI"),
|
|
||||||
v.get("effectiveDirective") or v.get("violatedDirective"),
|
|
||||||
v.get("blockedURI"),
|
|
||||||
v.get("sourceFile"),
|
|
||||||
v.get("lineNumber"),
|
|
||||||
v.get("columnNumber"),
|
|
||||||
)
|
|
||||||
if key in seen:
|
|
||||||
continue
|
|
||||||
seen.add(key)
|
|
||||||
uniq.append(v)
|
|
||||||
violations = uniq
|
|
||||||
|
|
||||||
return CrawlResult(
|
return CrawlResult(
|
||||||
visited=sorted(visited),
|
visited=sorted(visited),
|
||||||
csp=csp,
|
csp=csp,
|
||||||
nonce_detected=nonce_detected,
|
nonce_detected=nonce_detected,
|
||||||
directives=directives_out,
|
directives=directives_out,
|
||||||
notes=notes,
|
notes=notes,
|
||||||
violations=violations,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
|
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
|
||||||
ap = argparse.ArgumentParser(
|
ap = argparse.ArgumentParser(
|
||||||
prog="cspresso",
|
prog="csp-crawl",
|
||||||
description="Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.",
|
description="Crawl up to N pages (same-origin) with Playwright and generate a draft CSP.",
|
||||||
)
|
)
|
||||||
ap.add_argument("url", help="Start URL (e.g. https://example.com)")
|
ap.add_argument("url", help="Start URL (e.g. https://example.com)")
|
||||||
|
|
@ -700,31 +565,13 @@ def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
|
||||||
default=False,
|
default=False,
|
||||||
help="Analyze JS/CSS for sourceMappingURL and add map origins to connect-src",
|
help="Analyze JS/CSS for sourceMappingURL and add map origins to connect-src",
|
||||||
)
|
)
|
||||||
|
|
||||||
ap.add_argument(
|
|
||||||
"--bypass-csp",
|
|
||||||
action="store_true",
|
|
||||||
help="Strip any existing CSP/CSP-Report-Only response headers from HTML documents (useful for discovery or evaluation).",
|
|
||||||
)
|
|
||||||
ap.add_argument(
|
|
||||||
"--evaluate",
|
|
||||||
metavar="CSP",
|
|
||||||
default=None,
|
|
||||||
help="Inject the provided CSP string as Content-Security-Policy-Report-Only on HTML documents and exit 1 if any Report-Only violations are detected. Quote the value.",
|
|
||||||
)
|
|
||||||
ap.add_argument(
|
|
||||||
"--ignore-non-html",
|
|
||||||
action="store_true",
|
|
||||||
default=False,
|
|
||||||
help="Ignore non-HTML pages that get crawled (which might trigger Chromium's word-wrap hash: https://stackoverflow.com/a/69838710)",
|
|
||||||
)
|
|
||||||
ap.add_argument(
|
ap.add_argument(
|
||||||
"--json", action="store_true", help="Output JSON instead of a header line"
|
"--json", action="store_true", help="Output JSON instead of a header line"
|
||||||
)
|
)
|
||||||
return ap.parse_args(argv)
|
return ap.parse_args(argv)
|
||||||
|
|
||||||
|
|
||||||
def main(argv: list[str] | None = None) -> int:
|
def main(argv: list[str] | None = None) -> None:
|
||||||
args = _parse_args(argv)
|
args = _parse_args(argv)
|
||||||
browsers_path = Path(args.browsers_path).resolve() if args.browsers_path else None
|
browsers_path = Path(args.browsers_path).resolve() if args.browsers_path else None
|
||||||
|
|
||||||
|
|
@ -742,9 +589,6 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
allow_unsafe_eval=args.unsafe_eval,
|
allow_unsafe_eval=args.unsafe_eval,
|
||||||
upgrade_insecure_requests=args.upgrade_insecure_requests,
|
upgrade_insecure_requests=args.upgrade_insecure_requests,
|
||||||
include_sourcemaps=args.include_sourcemaps,
|
include_sourcemaps=args.include_sourcemaps,
|
||||||
bypass_csp=args.bypass_csp,
|
|
||||||
evaluate=args.evaluate,
|
|
||||||
ignore_non_html=args.ignore_non_html,
|
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -757,14 +601,12 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
"csp": result.csp,
|
"csp": result.csp,
|
||||||
"directives": result.directives,
|
"directives": result.directives,
|
||||||
"notes": result.notes,
|
"notes": result.notes,
|
||||||
"violations": result.violations,
|
|
||||||
"evaluated_policy": args.evaluate,
|
|
||||||
},
|
},
|
||||||
indent=2,
|
indent=2,
|
||||||
sort_keys=True,
|
sort_keys=True,
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
return 1 if (args.evaluate and result.violations) else 0
|
return
|
||||||
|
|
||||||
# Default: print header + visited pages as comments.
|
# Default: print header + visited pages as comments.
|
||||||
for u in result.visited:
|
for u in result.visited:
|
||||||
|
|
@ -773,24 +615,6 @@ def main(argv: list[str] | None = None) -> int:
|
||||||
print(f"# NOTE: {n}")
|
print(f"# NOTE: {n}")
|
||||||
print("Content-Security-Policy:", result.csp)
|
print("Content-Security-Policy:", result.csp)
|
||||||
|
|
||||||
if args.evaluate:
|
|
||||||
if result.violations:
|
|
||||||
print("# CSP Report-Only violations detected:")
|
|
||||||
for v in result.violations:
|
|
||||||
try:
|
|
||||||
blocked = v.get("blockedURI")
|
|
||||||
eff = v.get("effectiveDirective") or v.get("violatedDirective")
|
|
||||||
doc = v.get("documentURI")
|
|
||||||
print(f"# - {eff} blocked={blocked} on {doc}")
|
|
||||||
except Exception:
|
|
||||||
print(f"# - {v}")
|
|
||||||
return 1
|
|
||||||
return 0
|
|
||||||
|
|
||||||
return 0
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
import sys
|
main()
|
||||||
|
|
||||||
sys.exit(main())
|
|
||||||
|
|
|
||||||
|
|
@ -1,18 +1,14 @@
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import os
|
import os
|
||||||
import shutil
|
|
||||||
import subprocess # nosec
|
|
||||||
import sys
|
import sys
|
||||||
import tempfile
|
|
||||||
import time
|
import time
|
||||||
|
import subprocess # nosec
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from playwright.async_api import async_playwright, Error as PlaywrightError
|
from playwright.async_api import async_playwright, Error as PlaywrightError
|
||||||
|
|
||||||
__all__ = ["EnsureResult", "ensure_chromium_installed"]
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class EnsureResult:
|
class EnsureResult:
|
||||||
|
|
@ -20,93 +16,9 @@ class EnsureResult:
|
||||||
installed: bool
|
installed: bool
|
||||||
|
|
||||||
|
|
||||||
def _user_cache_dir() -> Path:
|
|
||||||
"""
|
|
||||||
Cross-platform cache dir without extra deps.
|
|
||||||
Linux: $XDG_CACHE_HOME or ~/.cache
|
|
||||||
macOS: ~/Library/Caches
|
|
||||||
Windows: %LOCALAPPDATA%
|
|
||||||
"""
|
|
||||||
if os.name == "nt":
|
|
||||||
base = os.environ.get("LOCALAPPDATA") or str(Path.home() / "AppData" / "Local")
|
|
||||||
return Path(base)
|
|
||||||
|
|
||||||
if sys.platform == "darwin":
|
|
||||||
return Path.home() / "Library" / "Caches"
|
|
||||||
|
|
||||||
return Path(os.environ.get("XDG_CACHE_HOME", str(Path.home() / ".cache")))
|
|
||||||
|
|
||||||
|
|
||||||
def _default_browsers_path() -> Path:
|
def _default_browsers_path() -> Path:
|
||||||
"""
|
# Project-local by default. Override with PLAYWRIGHT_BROWSERS_PATH or CLI flag.
|
||||||
If PLAYWRIGHT_BROWSERS_PATH is set, honor it (Playwright-standard).
|
return Path(__file__).resolve().parents[2] / ".pw-browsers"
|
||||||
Otherwise use a user-writable cache path (safe for AppImage/pip installs).
|
|
||||||
"""
|
|
||||||
env = os.environ.get("PLAYWRIGHT_BROWSERS_PATH")
|
|
||||||
if env and env.strip() and env.strip() != "0":
|
|
||||||
return Path(env).expanduser()
|
|
||||||
|
|
||||||
return _user_cache_dir() / "cspresso" / "pw-browsers"
|
|
||||||
|
|
||||||
|
|
||||||
def _looks_like_python(path: str) -> bool:
|
|
||||||
p = Path(path)
|
|
||||||
name = p.name.lower()
|
|
||||||
return (
|
|
||||||
p.exists()
|
|
||||||
and os.access(str(p), os.X_OK)
|
|
||||||
and (
|
|
||||||
name == "python" or name.startswith("python3") or name.startswith("python")
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _find_python_executable() -> str:
|
|
||||||
"""
|
|
||||||
In AppImage bundles, sys.executable may be the AppImage itself.
|
|
||||||
We need the embedded python binary so we can run: python -m playwright install chromium
|
|
||||||
"""
|
|
||||||
# 1) Normal venv/system case
|
|
||||||
if _looks_like_python(sys.executable):
|
|
||||||
return sys.executable
|
|
||||||
|
|
||||||
# 2) Sometimes present
|
|
||||||
base = getattr(sys, "_base_executable", None)
|
|
||||||
if base and _looks_like_python(base):
|
|
||||||
return base
|
|
||||||
|
|
||||||
# 3) Embedded python typically lives under sys.prefix/bin
|
|
||||||
bindir = "Scripts" if os.name == "nt" else "bin"
|
|
||||||
candidates = [
|
|
||||||
Path(sys.prefix)
|
|
||||||
/ bindir
|
|
||||||
/ f"python{sys.version_info.major}.{sys.version_info.minor}",
|
|
||||||
Path(sys.prefix) / bindir / f"python{sys.version_info.major}",
|
|
||||||
Path(sys.prefix) / bindir / "python3",
|
|
||||||
Path(sys.prefix) / bindir / "python",
|
|
||||||
Path(sys.base_prefix)
|
|
||||||
/ bindir
|
|
||||||
/ f"python{sys.version_info.major}.{sys.version_info.minor}",
|
|
||||||
Path(sys.base_prefix) / bindir / f"python{sys.version_info.major}",
|
|
||||||
Path(sys.base_prefix) / bindir / "python3",
|
|
||||||
Path(sys.base_prefix) / bindir / "python",
|
|
||||||
]
|
|
||||||
for c in candidates:
|
|
||||||
if _looks_like_python(str(c)):
|
|
||||||
return str(c)
|
|
||||||
|
|
||||||
# 4) Last resort: host python on PATH
|
|
||||||
for name in (
|
|
||||||
f"python{sys.version_info.major}.{sys.version_info.minor}",
|
|
||||||
"python3",
|
|
||||||
"python",
|
|
||||||
):
|
|
||||||
p = shutil.which(name)
|
|
||||||
if p and _looks_like_python(p):
|
|
||||||
return p
|
|
||||||
|
|
||||||
# Fallback (won't fix AppImage, but avoids crashing)
|
|
||||||
return sys.executable
|
|
||||||
|
|
||||||
|
|
||||||
def _env_with_browsers_path(browsers_path: Path) -> dict[str, str]:
|
def _env_with_browsers_path(browsers_path: Path) -> dict[str, str]:
|
||||||
|
|
@ -115,20 +27,14 @@ def _env_with_browsers_path(browsers_path: Path) -> dict[str, str]:
|
||||||
return env
|
return env
|
||||||
|
|
||||||
|
|
||||||
def _is_writable_dir(path: Path) -> bool:
|
|
||||||
try:
|
|
||||||
path.mkdir(parents=True, exist_ok=True)
|
|
||||||
probe = path / ".write_probe"
|
|
||||||
probe.write_text("x", encoding="utf-8")
|
|
||||||
probe.unlink(missing_ok=True)
|
|
||||||
return True
|
|
||||||
except OSError:
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def _acquire_install_lock(
|
def _acquire_install_lock(
|
||||||
lock_path: Path, timeout_s: float = 120.0, poll_s: float = 0.2
|
lock_path: Path, timeout_s: float = 120.0, poll_s: float = 0.2
|
||||||
) -> None:
|
) -> None:
|
||||||
|
"""Very small cross-platform lock using atomic file creation.
|
||||||
|
Avoids concurrent Playwright installs when multiple processes start at once.
|
||||||
|
|
||||||
|
Not perfect, but good enough for most CLI usage.
|
||||||
|
"""
|
||||||
start = time.time()
|
start = time.time()
|
||||||
while True:
|
while True:
|
||||||
try:
|
try:
|
||||||
|
|
@ -143,16 +49,14 @@ def _acquire_install_lock(
|
||||||
|
|
||||||
def _release_install_lock(lock_path: Path) -> None:
|
def _release_install_lock(lock_path: Path) -> None:
|
||||||
try:
|
try:
|
||||||
lock_path.unlink(missing_ok=True)
|
lock_path.unlink(missing_ok=True) # Python 3.8+
|
||||||
except Exception:
|
except Exception:
|
||||||
pass # nosec
|
pass # nosec
|
||||||
|
|
||||||
|
|
||||||
def _install_chromium(browsers_path: Path, with_deps: bool = False) -> None:
|
def _install_chromium(browsers_path: Path, with_deps: bool = False) -> None:
|
||||||
env = _env_with_browsers_path(browsers_path)
|
env = _env_with_browsers_path(browsers_path)
|
||||||
py = _find_python_executable()
|
cmd = [sys.executable, "-m", "playwright", "install"]
|
||||||
|
|
||||||
cmd = [py, "-m", "playwright", "install"]
|
|
||||||
if with_deps:
|
if with_deps:
|
||||||
cmd.append("--with-deps")
|
cmd.append("--with-deps")
|
||||||
cmd.append("chromium")
|
cmd.append("chromium")
|
||||||
|
|
@ -161,6 +65,7 @@ def _install_chromium(browsers_path: Path, with_deps: bool = False) -> None:
|
||||||
|
|
||||||
|
|
||||||
async def _can_launch_chromium(browsers_path: Path) -> bool:
|
async def _can_launch_chromium(browsers_path: Path) -> bool:
|
||||||
|
# Ensure this process uses the same path too.
|
||||||
os.environ["PLAYWRIGHT_BROWSERS_PATH"] = str(browsers_path)
|
os.environ["PLAYWRIGHT_BROWSERS_PATH"] = str(browsers_path)
|
||||||
try:
|
try:
|
||||||
async with async_playwright() as p:
|
async with async_playwright() as p:
|
||||||
|
|
@ -177,36 +82,23 @@ async def ensure_chromium_installed(
|
||||||
with_deps: bool = False,
|
with_deps: bool = False,
|
||||||
lock_timeout_s: float = 120.0,
|
lock_timeout_s: float = 120.0,
|
||||||
) -> EnsureResult:
|
) -> EnsureResult:
|
||||||
"""
|
"""Ensure Playwright's Chromium is installed and launchable.
|
||||||
Ensure Playwright Chromium is installed and launchable.
|
|
||||||
|
|
||||||
- Honors PLAYWRIGHT_BROWSERS_PATH if set.
|
Strategy:
|
||||||
- Defaults to a user cache dir (safe for AppImage readonly mounts).
|
- Attempt a tiny headless launch.
|
||||||
- Uses embedded python to run playwright installer when sys.executable is the AppImage.
|
- If it fails, acquire a lock and run `python -m playwright install chromium` (optionally --with-deps).
|
||||||
|
- Retry launch once.
|
||||||
"""
|
"""
|
||||||
explicit = browsers_path is not None
|
|
||||||
bp = browsers_path or _default_browsers_path()
|
bp = browsers_path or _default_browsers_path()
|
||||||
|
|
||||||
# If it already works, do nothing.
|
|
||||||
if await _can_launch_chromium(bp):
|
|
||||||
return EnsureResult(browsers_path=bp, installed=False)
|
|
||||||
|
|
||||||
# If we need to install and the chosen dir isn't writable, fall back (unless explicit).
|
|
||||||
if not explicit and not _is_writable_dir(bp):
|
|
||||||
bp = _user_cache_dir() / "cspresso" / "pw-browsers"
|
|
||||||
if not _is_writable_dir(bp):
|
|
||||||
bp = Path(tempfile.gettempdir()) / "cspresso" / "pw-browsers"
|
|
||||||
bp.mkdir(parents=True, exist_ok=True)
|
bp.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
if explicit and not _is_writable_dir(bp):
|
if await _can_launch_chromium(bp):
|
||||||
raise OSError(
|
return EnsureResult(browsers_path=bp, installed=False)
|
||||||
f"Browsers path is not writable: {bp}\n"
|
|
||||||
"Choose a writable directory via --browsers-path or set PLAYWRIGHT_BROWSERS_PATH."
|
|
||||||
)
|
|
||||||
|
|
||||||
lock_path = bp / ".install.lock"
|
lock_path = bp / ".install.lock"
|
||||||
_acquire_install_lock(lock_path, timeout_s=lock_timeout_s)
|
_acquire_install_lock(lock_path, timeout_s=lock_timeout_s)
|
||||||
try:
|
try:
|
||||||
|
# Another process might have installed while we waited; check again.
|
||||||
if await _can_launch_chromium(bp):
|
if await _can_launch_chromium(bp):
|
||||||
return EnsureResult(browsers_path=bp, installed=False)
|
return EnsureResult(browsers_path=bp, installed=False)
|
||||||
|
|
||||||
|
|
@ -214,7 +106,7 @@ async def ensure_chromium_installed(
|
||||||
|
|
||||||
if not await _can_launch_chromium(bp):
|
if not await _can_launch_chromium(bp):
|
||||||
raise RuntimeError(
|
raise RuntimeError(
|
||||||
"Chromium install completed, but Chromium still failed to launch. "
|
"Playwright Chromium install completed, but Chromium still failed to launch. "
|
||||||
"On Linux, you may need additional system dependencies."
|
"On Linux, you may need additional system dependencies."
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue