Skip to content
rewget GitHub

← Back to writing

wget vs curl, finally settled

neul-labs · ·
wgetcurltooling

This is the post we did not want to write. The wget-versus-curl argument is the closest thing systems engineers have to a religious war, and most of the time the argument is being had between two people who already agree on the answer and just want to relitigate the trade-offs. We are going to do it anyway, because the conclusion turns out to actually matter for how rewget is shaped.

The short version: the question is wrong. wget and curl were designed for different problems. The fact that their feature sets overlap on roughly seventy percent of “fetch a URL” use cases is a coincidence of HTTP being the lingua franca of network transfer, not evidence that one of them is obsolete. If you treat the question as “which tool replaces the other,” you end up with bad scripts. If you treat the question as “what shape is the problem I am solving,” you end up reaching for the right tool the first time.

The shapes

wget is download-shaped. The default behaviour is to save the response body to a file on disk, with a path derived from the URL. The defaults around resumability (-c), recursion (-r), mirroring (-m), and rate-limiting (--limit-rate) all assume you are going to leave the command running and come back to a folder of files. The flags compose around “I want a copy of that thing.” If you tail the man page, it reads like a download manager that grew a CLI.

curl is request-shaped. The default behaviour is to print the response body to stdout, with the response headers suppressed unless you ask for them. The defaults around bodies (--data), authentication (--user, --anyauth), headers (-H), and verbose protocol inspection (-v, --trace) all assume you are making a single HTTP request and want to see what the server said. The flags compose around “I want to interact with that thing.” Tail the man page and it reads like a network protocol Swiss army knife.

Both speak HTTP. Both can do a GET. That is roughly where the overlap stops mattering.

The default that gives the game away

Here is the cleanest test for which tool fits a job: ask yourself what you want to happen by default, with no flags.

If you want a file on disk named like the URL, you want wget. The command wget https://example.com/file.tar.gz writes file.tar.gz to the current directory and exits. No flags required. The behaviour you want is the default.

If you want the response body printed to your terminal so you can pipe it into jq or grep, you want curl. The command curl https://example.com/api/users dumps the response to stdout. The behaviour you want is the default.

Most of the gymnastics that show up in script reviews — curl -o filename, wget -qO -, the constellation of flags people add to make one tool behave like the other — are evidence that someone reached for the wrong shape and is now fighting it.

Protocol breadth

curl wins on protocols. wget does HTTP, HTTPS, and FTP. curl does SCP, SFTP, IMAP, IMAPS, SMTP, SMTPS, POP3, POP3S, LDAP, LDAPS, MQTT, RTSP, telnet, and a few others people stopped using a decade ago. If your problem is “I need to talk to this protocol from a script and I do not want to write the protocol implementation,” curl is almost always the answer, and the only question is whether libcurl bindings get you closer.

wget does not lose this argument because it is not in it. wget never tried to be protocol-comprehensive.

Recursive mirroring

wget wins on recursion. The -m mode, with its --convert-links, --no-parent, and --page-requisites siblings, is the single best tool in the ecosystem for “give me a local copy of this site.” curl does not have a recursive mode. People have written wrappers; they are all worse than wget’s native implementation.

If your job is “mirror this onto local disk,” reach for wget. If you reach for curl and start writing a recursive script around it, stop and reach for wget.

Modern HTTP

For decades curl led wget on modern HTTP behaviour. HTTP/2 landed in curl first; HTTP/3 landed in curl first; the breadth of TLS backends is wider in curl. wget2 has closed the gap on HTTP/2, and rewget’s stage 2 path uses rquest, which inherits modern HTTP support and adds TLS fingerprint impersonation that neither upstream tool natively offers. The actual frontier on HTTP is no longer “wget versus curl” — it is “what does the WAF expect a real client to look like,” which is a different problem.

That said, if you are doing one-shot requests against APIs and you care about TLS settings, ALPN selection, connection reuse, or HTTP/3, curl is what you reach for and you should not feel bad about it.

Output and scripting

curl prints to stdout. wget prints progress to stderr and writes the response body to a file. These two defaults are not a matter of taste; they are why pipelines are easier with curl and downloads are easier with wget.

Want to pipe a response into a tool? curl. The output is the body. No file ever lands on disk.

Want a file on disk because the next step is a checksum, a tar, or a mv? wget. The file is on disk. No -o to remember.

Want to do both? Use both. They cost ten kilobytes installed.

The rewget answer

We built rewget because the wget shape is the right shape for downloads — file on disk, resumable, recursive when needed — and because the wget shape stopped working on a growing share of the internet. The right answer was never “use curl instead.” curl gets blocked too, for the same fingerprint reasons. The right answer was “keep the wget shape and fix the blocking.”

So we did. rewget runs wget as stage 1. On a block, it retries with browser TLS fingerprints via rquest. If that does not work, it spins up headless Chromium for the cases that need it. The CLI is wget. The behaviour is wget. The success rate is higher because three stages have more cards to play than one.

Which means our answer to “wget versus curl” is: use both. Alias wget=rewget so your downloads get the fallback path. Keep curl for one-shot API requests. Stop trying to fold both shapes into one binary. The two-tool answer was always the right one; the only thing that changed is that wget now needs help to keep doing its job, and rewget is that help.

What does not matter

A few arguments that come up in the wget-versus-curl discussion that, in our experience, do not actually affect the answer:

  • License differences. wget is GPLv3, curl is MIT-derivative, rewget is MIT or Apache-2.0. Unless you are statically linking into a proprietary product, neither of these has ever changed a CI script.
  • Memory footprint. Both tools fit in tens of megabytes of resident memory under realistic workloads. If you are optimizing this number, you have other things to optimize first.
  • Pretty output. wget’s progress bar is friendlier; curl’s is more compact. We have never seen this decide a deployment.
  • “But it is built in” arguments. Both are. Pick whichever one the next box you SSH into will already have, and accept that the next box will probably have both.

A working policy

Here is the policy we landed on after far too many of these conversations:

  1. If the job is “fetch a URL and put the response in a file,” use wget — or, if the URL might be guarded, rewget.
  2. If the job is “make an HTTP request, look at the response, maybe pipe it somewhere,” use curl.
  3. If the job is “mirror a site,” use wget. (rewget will keep working if the site starts blocking.)
  4. If the job is “talk to an obscure protocol,” use curl.
  5. If the job is “extract a single field from an API response in a shell script,” use curl with -s and pipe to jq.
  6. Do not refactor working scripts to “use the other tool” without a reason.

That is the settled answer. The two tools coexist because the two shapes coexist, and the only update we want to make to the canonical advice is point 1 — these days, wget by itself is not enough on guarded sites, and rewget is what closes that gap.