Proxies - Everything Curl
Proxies - Everything Curl
Proxies - Everything Curl
Proxies
A proxy is a machine or software that does something on behalf of you, the client.
You can also see it as a middle man that sits between you and the server you want to work with,
a middle man that you connect to instead of the actual remote server. You ask the proxy to
perform your desired operation for you and then it will run off and do that and then return the
data to you.
There are several different types of proxies and we shall list and discuss them further down in
this section.
Some networks are setup to require a proxy in order for you to reach the Internet or perhaps that
special network you are interested in. The use of proxies are introduced on your network by the
people and management that run your network for policy or technical reasons.
In the networking space there are a few methods for the automatic detection of proxies and how
to connect to them, but none of those methods are truly universal and curl supports none of
them. Furthermore, when you communicate to the outside world through a proxy that often
means that you have to put a lot of trust on the proxy as it will be able to see and modify all the
non-secure network traffic you send or get through it. That trust is not easy to assume
automatically.
If you check your browser's network settings, sometimes under an advanced settings tab, you
can learn what proxy or proxies your browser is configured to use. Chances are big that you
should use the same one or ones when you use curl.
As an example, you can find proxy settings for Firefox browser in Preferences => General =>
Network Settings as shown below:
https://ec.haxx.se/usingcurl/usingcurl-proxies 1/10
21.8.2020 Proxies - Everything curl
PAC
Some network environments provides several different proxies that should be used in different
situations, and a customizable way to handle that is supported by the browsers. This is called
"proxy auto-config", or PAC.
A PAC file contains a JavaScript function that decides which proxy a given network connection
(URL) should use, and even if it should not use a proxy at all. Browsers most typically read the
PAC file off a URL on the local network.
Since curl has no JavaScript capabilities, curl does not support PAC files. If your browser and
network use PAC files, the easiest route forward is usually to read the PAC file manually and
figure out the proxy you need to specify to run curl successfully.
https://ec.haxx.se/usingcurl/usingcurl-proxies 2/10
21.8.2020 Proxies - Everything curl
Captive portals
These are not proxies but they're blocking the way between you and the server you want to
access.
A "captive portal" is one of these systems that are popular to use in hotels, airports and for other
sorts of network access to a larger audience. The portal will "capture" all network traffic and
redirect you to a login web page until you've either clicked OK and verified that you've read their
conditions or perhaps even made sure that you've paid plenty of money for the right to use the
network.
curl's traffic will of course also captured by such portals and often the best way is to use a
browser to accept the conditions and "get rid of" the portal since from then on they often allow all
other traffic originating from that same machine (MAC address) for a period of time.
Most often you can use curl too to submit that "ok" affirmation, if you just figure out how to
submit the form and what fields to include in it. If this is something you end up doing many times,
it may be worth exploring.
Proxy type
The default proxy type is HTTP so if you specify a proxy host name (or IP address) without a
scheme part (the part that is often written as "http://") curl goes with assuming it's an HTTP
proxy.
curl also allows a number of different options to set the proxy type instead of using the scheme
prefix. See the SOCKS section below.
HTTP
An HTTP proxy is a proxy that the client speaks HTTP with to get the transfer done. curl will, by
default, assume that a host you point out with -x or --proxy is an HTTP proxy, and unless
you also specify a port number it will default to port 3128 (and the reason for that particular port
number is purely historical).
https://ec.haxx.se/usingcurl/usingcurl-proxies 3/10
21.8.2020 Proxies - Everything curl
If you want to request the example.com web page using a proxy on 192.168.0.1 port 8080, a
command line could look like:
Recall that the proxy receives your request, forwards it to the real server, then reads the response
from the server and then hands that back to the client.
If you enable verbose mode with -v when talking to a proxy, you will see that curl connects to
the proxy instead of the remote server, and you will see that it uses a slightly different request
line.
HTTPS was designed to allow and provide secure and safe end-to-end privacy from the client to
the server (and back). In order to provide that when speaking to an HTTP proxy, the HTTP
protocol has a special request that curl uses to setup a tunnel through the proxy that it then can
encrypt and verify. This HTTP method is known as CONNECT .
When the proxy tunnels encrypted data through to the remote server after a CONNECT method
sets it up, the proxy cannot see nor modify the traffic without breaking the encryption:
MITM-proxies
To do this, they require users to install a custom "trust root" (Certificate Authority (CA) certificate)
in the client, and then the proxy terminates all TLS traffic from the client, impersonates the
https://ec.haxx.se/usingcurl/usingcurl-proxies 4/10
21.8.2020 Proxies - Everything curl
remote server and acts like a proxy. The proxy then sends back a generated certificate signed by
the custom CA. Such proxy setups usually transparently capture all traffic from clients to TCP
port 443 on a remote machine. Running curl in such a network would also get its HTTPS traffic
captured.
This practice, of course, allows the middle man to decrypt and snoop on all TLS traffic.
An "HTTP proxy" means the proxy itself speaks HTTP. HTTP proxies are primarily used to proxy
HTTP but it is also fairly common that they support other protocols as well. In particular, FTP is
fairly commonly supported.
When talking FTP "over" an HTTP proxy, it is usually done by more or less pretending the other
protocol works like HTTP and asking the proxy to "get this URL" even if the URL is not using HTTP.
This distinction is important because it means that when sent over an HTTP proxy like this, curl
does not really speak FTP even though given an FTP URL; thus FTP-specific features will not
work:
What you can do instead then, is to "tunnel through" the HTTP proxy!
Most HTTP proxies allow clients to "tunnel through" it to a server on the other side. That's exactly
what's done every time you use HTTPS through the HTTP proxy.
You tunnel through an HTTP proxy with curl using -p or --proxytunnel .
When you do HTTPS through a proxy you normally connect through to the default HTTPS remote
TCP port number 443, so therefore you will find that most HTTP proxies white list and allow
connections only to hosts on that port number and perhaps a few others. Most proxies will deny
https://ec.haxx.se/usingcurl/usingcurl-proxies 5/10
21.8.2020 Proxies - Everything curl
clients from connecting to just any random port (for reasons only the proxy administrators
know).
Still, assuming that the HTTP proxy allows it, you can ask it to tunnel through to a remote server
on any port number so you can do other protocols "normally" even when tunneling. You can do
FTP tunneling like this:
You can tell curl to use HTTP/1.0 in its CONNECT request issued to the HTTP proxy by using
--proxy1.0 [proxy] instead of -x .
SOCKS types
SOCKS is a protocol used for proxies and curl supports it. curl supports both SOCKS version 4 as
well as version 5, and both versions come in two flavors.
You can select the specific SOCKS version to use by using the correct scheme part for the given
proxy host with -x , or you can specify it with a separate option instead of -x .
SOCKS4 is for the version 4 and SOCKS4a is for the version 4 without resolving the host name
locally:
https://ec.haxx.se/usingcurl/usingcurl-proxies 6/10
21.8.2020 Proxies - Everything curl
SOCKS5 is for the version 5 and SOCKS5-hostname is for the version 5 without resolving the host
name locally:
The SOCKS5-hostname versions. This sends the host name to the server so there's no name
resolving done locally:
Proxy authentication
HTTP proxies can require authentication, so curl then needs to provide the proper credentials to
the proxy to be allowed to use it, and failing to do will only make the proxy return HTTP responses
using code 407.
Authentication for proxies is similar to "normal" HTTP authentication. It is separate from the
server authentication to allow clients to independently use both normal host authentication as
well as proxy authentication.
With curl, you set the user name and password for the proxy authentication with the
-U user:password or --proxy-user user:password option:
This example will default to using the Basic authentication scheme. Some proxies will require
another authentication scheme (and the headers that are returned when you get a 407 response
will tell you which) and then you can ask for a specific method with --proxy-digest ,
https://ec.haxx.se/usingcurl/usingcurl-proxies 7/10
21.8.2020 Proxies - Everything curl
--proxy-negotiate , --proxy-ntlm . The above example command again, but asking for
There's also the option that asks curl to figure out which method the proxy wants and supports
and then go with that (with the possible expense of extra roundtrips) using --proxy-anyauth .
Asking curl to use any method the proxy wants is then like this:
HTTPS to proxy
All the previously mentioned protocols to speak with the proxy are clear text protocols, HTTP and
the SOCKS versions. Using these methods could allow someone to eavesdrop on your traffic the
local network where you or the proxy reside.
One solution for that is to use HTTPS to the proxy, which then establishes a secure and
encrypted connection that is safe from easy surveillance.
curl checks for the existence of specially named environment variables before it runs to see if a
proxy is requested to get used.
You specify the proxy by setting a variable named [scheme]_proxy to hold the proxy host name
(the same way you would specify the host with -x ). So if you want to tell curl to use a proxy
when access a HTTP server, you set the 'http_proxy' environment variable. Like this:
1 http_proxy=http://proxy.example.com:80
2 curl -v www.example.com
https://ec.haxx.se/usingcurl/usingcurl-proxies 8/10
21.8.2020 Proxies - Everything curl
While the above example shows HTTP, you can, of course, also set ftp_proxy, https_proxy, and so
on. All these proxy environment variable names except http_proxy can also be specified in
uppercase, like HTTPS_PROXY.
To set a single variable that controls all protocols, the ALL_PROXY exists. If a specific protocol
variable one exists, such a one will take precedence.
When using environment variables to set a proxy, you could easily end up in a situation where one
or a few host names should be excluded from going through the proxy. This is then done with the
NO_PROXY variable. Set that to a comma- separated list of host names that should not use a
proxy when being accessed. You can set NO_PROXY to be a single asterisk ('*') to match all
hosts.
As an alternative to the NO_PROXY variable, there's also a --noproxy command line option that
serves the same purpose and works the same way.
The HTTP version of the proxy environment variables is treated differently than the others. It is
only accepted in its lower case version because of the CGI protocol, which lets users run scripts
in a server when invoked by an HTTP server. When a CGI script is invoked by a server, it
automatically creates environment variables for the script based on the incoming headers in the
request. Those environment variables are prefixed with uppercase HTTP_ !
An incoming request to a HTTP server using a request header like Proxy: yada will therefore
create the environment variable HTTP_PROXY set to contain yada before the CGI script is
started. If that CGI script runs curl...
Accepting the upper case version of this environment variable has been the source for many
security problems in lots of software through times.
Proxy headers
https://ec.haxx.se/usingcurl/usingcurl-proxies 9/10
21.8.2020 Proxies - Everything curl
When you want to add HTTP headers meant specifically for a proxy and not for the remote server,
the --header option falls short.
For example, if you issue a HTTPS request through a HTTP proxy, it will be done by first issuing a
CONNECT to the proxy that establishes a tunnel to the remote server and then it sends the
request to that server. That first CONNECT is only issued to the proxy and you may want to make
sure only that receives your special header, and send another set of custom headers to the
remote server.
https://ec.haxx.se/usingcurl/usingcurl-proxies 10/10