# rewget — full content for LLM ingestion > rewget: wget, but it works everywhere. A drop-in wget replacement that automatically bypasses bot protection. Built in Rust by neul-labs. Dual-licensed under MIT or Apache-2.0. ## What rewget is rewget is a thin CLI shim around wget (and a sidecar daemon, rewgetd) that intercepts the cases where wget alone fails: 403 Forbidden from bot detection, CAPTCHA challenge pages, 429 rate limits, and "works in a browser but not in wget" pages. When stage 1 (plain wget) fails, stage 2 retries the same URL with browser-grade TLS and HTTP/2 fingerprints via rquest. If that still fails, stage 3 launches headless Chromium to solve JavaScript challenges. Domain-level results are cached for seven days. All standard wget options work unchanged. The new behaviour is exposed under `--rewget-*` flags so existing scripts don't break. ## Install - Homebrew: `brew install neul-labs/tap/rewget` (macOS and Linux) - npm: `npm install -g rewget` (macOS and Linux) - PyPI: `uv tool install rewget` or `pip install rewget` (macOS and Linux) - Cargo: `cargo install rewget` (from source) - Install script: `curl -fsSL https://raw.githubusercontent.com/neul-labs/rewget/main/scripts/install.sh | sh` Platform support today: Linux and macOS. See the README badges for the canonical platform list. ## How it works Stage 1 — wget. Fast, zero overhead. If the download succeeds, you get the exact same output you would have got from wget itself. Stage 2 — Impersonate. rewgetd retries the same URL through rquest, using a Chrome, Firefox, Safari, or Edge TLS and HTTP/2 fingerprint. Profiles are versioned and Ed25519-verified; `--rewget-update-profiles` fetches new ones. Stage 3 — JS Preflight. rewgetd launches headless Chromium via chromiumoxide. The browser solves the challenge, exports cookies and headers, and stage 2 retries with that session. Cache: per-domain, 7-day TTL. Subsequent requests skip straight to the working stage. ## Flags - `--rewget-no-fallback` — disable automatic retry on block - `--rewget-js` — force JavaScript preflight (Stage 3) - `--rewget-js-wait=EVENT` — wait condition: load, domcontentloaded, networkidle - `--rewget-profile=NAME` — use a specific browser profile - `--rewget-fallback-codes=N,N` — only retry on these HTTP status codes - `--rewget-engine=wget|wget2` — choose wget engine - `--rewget-list-profiles` — list available browser profiles - `--rewget-update-profiles` — fetch latest profiles (Ed25519 verified) - `--rewget-version` — show rewget version Profiles currently shipping: Chrome 131/130, Firefox 136/133, Safari 18, Edge 131. ## Architecture - `rewget` — thin CLI shim. Parses `--rewget-*` flags, forwards everything else to wget. - `rewgetd` — daemon, handles stage 2 and stage 3. Manages the browser pool and TLS sessions. - `rewget-core` — shared library: detection, caching, profile logic. Stack: Rust with Tokio, rquest for TLS/HTTP2 impersonation, chromiumoxide for headless browser control, nng for CLI-to-daemon IPC, mimalloc for allocation. ## Why rewget The wget-vs-curl debate is decades old. We don't pick a side. We use both — wget for resumable, recursive downloads, curl for one-shot requests with bodies and headers — and we both still hit the same wall: sites that work in a browser but throw 403s at any HTTP client because the TLS fingerprint or HTTP/2 frame ordering is "wrong". rewget keeps the wget surface (so your scripts, your aliases, your muscle memory all keep working) and adds the fallback you would otherwise script by hand: detect the block, swap to a fingerprint that matches a real browser, and only escalate to a full browser when the page actually needs JavaScript. The three-stage design means the common case stays fast. If a site doesn't block, you pay zero overhead — you get wget. If it blocks once, rewget records the working stage and the next download from that domain goes straight to it. ## License & repository - License: MIT or Apache-2.0 (dual) - Repo: https://github.com/neul-labs/rewget - Docs: https://docs.neullabs.com/rewget/ - Issues, contributions: GitHub ## Comparison summary vs bare wget — same CLI surface, same behaviour on sites that aren't blocking; transparent fallback on the cases wget can't handle. vs bare curl — different CLI shape. curl is one-shot; wget (and rewget) is download-oriented with resume and recursion. Pick rewget if you already prefer the wget shape and want it to keep working. ## What rewget does not claim - We do not claim Windows-native support today. Platform badges in the README list Linux and macOS. - We do not claim to bypass authenticated bot protection. rewget handles fingerprint-based and JavaScript-challenge blocks, not policy-driven access control. - We do not claim 100% success against every WAF. Domain caching exposes which stage works for which sites.