Right now the proxying of cloudflared -> unix socket is a bit of
a no man's land, where we do not have the ability to specify the
actual protocol since the user just configures "unix:/path/"
In practice, we proxy using an HTTP client.
But it could be that the origin expects HTTP or HTTPS. However,
we have no way of knowing.
So how are we proxying to it? We are configuring the http.Request
in ways that depend on the transport and edge implementation, and
it so happens that for h2mux and http2 we are using a http.Request
whose Scheme is HTTP, whereas for quic we are generating a http.Request
whose scheme is HTTPS.
Since it does not make sense to have different behaviours depending
on the transport, we are making a (hopefully temporary) change so
that proxied requests to Unix sockets are systematically HTTP.
In practice we should do https://github.com/cloudflare/cloudflared/issues/502
to make this configurable.
Until this PR, we were naively closing the quic.Stream whenever
the callstack for handling the request (HTTP or TCP) finished.
However, our proxy handler may still be reading or writing from
the quic.Stream at that point, because we return the callstack if
either side finishes, but not necessarily both.
This is a problem for quic-go library because quic.Stream#Close
cannot be called concurrently with quic.Stream#Write
Furthermore, we also noticed that quic.Stream#Close does nothing
to do receiving stream (since, underneath, quic.Stream has 2 streams,
1 for each direction), thus leaking memory, as explained in:
https://github.com/lucas-clemente/quic-go/issues/3322
This PR addresses both problems by wrapping the quic.Stream that
is passed down to the proxying logic and handle all these concerns.
We have made 2 changes in the past that caused an unexpected edge case:
1. when faced with QUIC "no network activity", give up re-attempts and fall-back
2. when a protocol is chosen explicitly, rather than using auto (the default), do not fallback
The reasoning for 1. was to fallback quickly in situations where the user may not
have chosen QUIC, and simply got it because we auto-chose it (with the TXT DNS record),
but the users' environment does not allow egress via UDP.
The reasoning for 2. was to avoid falling back if the user explicitly chooses a
protocol. E.g., if the user chooses QUIC, she may want to do UDP proxying, so if
we fallback to HTTP2 protocol that will be unexpected since it does not support
UDP (and same applies for HTTP2 falling back to h2mux and TCP proxying).
This PR fixes the edge case that happens when both those changes 1. and 2. are
put together: when faced with a QUIC "no network activity", we should only try
to fallback if there is a possible fallback. Otherwise, we should exhaust the
retries as normal.
This parameterizes relevant component tests by transport protocol
where applicable.
The motivation is to have coverage for (graceful or not) shutdown
that was broken in QUIC. That logic (as well as reconnect) is
different depending on the transport, so we should have it
parameterized. In fact, the test is failing for QUIC (and passing
for others) right now, which is expected until we roll out some
edge fixes for QUIC. So we could have caught this earlier on.
This adds various bug fixes when investigating why QUIC transports were
not being unregistered when they should (and only when the graceful shutdown
started).
Most of these bug fixes are making the QUIC transport implementation closer
to its HTTP2 counterpart:
- ServeControlStream is now a blocking function (it's up to the transport to handle that)
- QUIC transport then handles the control plane as part of its Serve, thus waiting for it on shutdown
- QUIC transport now returns "non recoverable" for connections with similar semantics to HTTP2 and H2mux
- QUIC transport no longer has a loop around its Serve logic that retries connections on its own (that logic is upstream)
This does a few fixes to make sure that the QUICConnection returns from
Serve when the context is cancelled.
QUIC transport now behaves like other transports: closes as soon as there
is no traffic, or at most by grace-period. Note that we do not wait for
UDP traffic since that's connectionless by design.
This way we will force the adoption of FIPS compliant cloudflared without having
to handle the transition for systems that already have it installed (since we
were previously using new artifacts with fips suffix) nor without having to
segregate the resulting binary name (since we were always generating a binary
just called cloudflared from the unpacked debian archive to avoid having to change
any automation that assumes the binary to be called just that).
This changes existing Makefile targets to make it obvious that they are
used to publish debian packages for internal Cloudflare usage. Those are
now FIPS compliant, with no alternative provided. This only affects amd64
builds (and we only publish internally for Linux).
This new Makefile target is used by all internal builds (including nightly
that is used for e2e tests).
Note that this Makefile target renames the artifact to be just `cloudflared`
so that this is used "as is" internally, without expecting people to opt-in
to the new `cloudflared-fips` package (as we are giving them no alternative).
This is a cherry-pick of 157f5d1412
followed by build/CI changes so that amd64/linux FIPS compliance is
provided by new/separate binaries/artifacts/packages.
The reasoning being that FIPS compliance places excessive requirements
in the encryption algorithms used for regular users that do not care
about that. This can cause cloudflared to reject HTTPS origins that
would otherwise be accepted without FIPS checks.
This way, by having separate binaries, existing ones remain as they
were, and only FIPS-needy users will opt-in to the new FIPS binaries.
This reverts commit 157f5d1412.
FIPS compliant binaries (for linux/amd64) are causing HTTPS origins to not
be reachable by cloudflared in certain cases (e.g. with Let's Encrypt certificates).
Origins that are not HTTPS for cloudflared are not affected.