In a previous commit, we fixed a bug where the client roundtrip code
could close the request body, which in fact would be the quic.Stream,
thus closing the write-side.
The way that was fixed, prevented the client roundtrip code from closing
also read-side (the body).
This fixes that, by allowing close to only close the read side, which
will guarantee that any subsquent will fail with an error or EOF it
occurred before the close.
cloudflared falls back aggressively to HTTP/2 protocol if a connection
attempt with QUIC failed. This was done to ensure that machines with UDP
egress disabled did not stop clients from connecting to the cloudlfare
edge. This PR improves on that experience by having cloudflared remember
if a QUIC connection was successful which implies UDP egress works. In
this case, cloudflared does not fallback to HTTP/2 and keeps trying to
connect to the edge with QUIC.
cloudflared falls back aggressively to HTTP/2 protocol if a connection
attempt with QUIC failed. This was done to ensure that machines with UDP
egress disabled did not stop clients from connecting to the cloudlfare
edge. This PR improves on that experience by having cloudflared remember
if a QUIC connection was successful which implies UDP egress works. In
this case, cloudflared does not fallback to HTTP/2 and keeps trying to
connect to the edge with QUIC.
This commit guarantees that stream is only closed once the are finished
handling the stream. Without it, we were seeing closes being triggered
by the code that proxies to the origin, which was resulting in failures
to actually send downstream the status code of the proxy request to the
eyeball.
This was then subsequently triggering unexpected retries to cloudflared
in situations such as cloudflared being unable to reach the origin.
For Google's managed prometheus, it seems they reserved certain
labels for their internal service regions/locations. This causes
customers to run into issues using our metrics since our
metric: `cloudflared_tunnel_server_locations` has a `location`
label. Renaming this to `edge_location` should unblock the
conflict and usage.
The idle period is set to 5sec.
We now also ping every second since last activity.
This makes the quic.Connection less prone to being closed with
no network activity, since we send multiple pings per idle
period, and thus a single packet loss cannot cause the problem.
Setting the body to nil was rendering cloudflared to crashing with
a SIGSEGV in the odd case where the hostname accessed maps to a
TCP origin (e.g. SSH/RDP/...) but the eyeball sends a plain HTTP
request that does not go through cloudflared access (thus not wrapped
in websocket as it should).
Instead, QUIC transport now sets http.noBody in that condition, which
deals with the situation gracefully.
Until this PR, we were naively closing the quic.Stream whenever
the callstack for handling the request (HTTP or TCP) finished.
However, our proxy handler may still be reading or writing from
the quic.Stream at that point, because we return the callstack if
either side finishes, but not necessarily both.
This is a problem for quic-go library because quic.Stream#Close
cannot be called concurrently with quic.Stream#Write
Furthermore, we also noticed that quic.Stream#Close does nothing
to do receiving stream (since, underneath, quic.Stream has 2 streams,
1 for each direction), thus leaking memory, as explained in:
https://github.com/lucas-clemente/quic-go/issues/3322
This PR addresses both problems by wrapping the quic.Stream that
is passed down to the proxying logic and handle all these concerns.
This adds various bug fixes when investigating why QUIC transports were
not being unregistered when they should (and only when the graceful shutdown
started).
Most of these bug fixes are making the QUIC transport implementation closer
to its HTTP2 counterpart:
- ServeControlStream is now a blocking function (it's up to the transport to handle that)
- QUIC transport then handles the control plane as part of its Serve, thus waiting for it on shutdown
- QUIC transport now returns "non recoverable" for connections with similar semantics to HTTP2 and H2mux
- QUIC transport no longer has a loop around its Serve logic that retries connections on its own (that logic is upstream)
This does a few fixes to make sure that the QUICConnection returns from
Serve when the context is cancelled.
QUIC transport now behaves like other transports: closes as soon as there
is no traffic, or at most by grace-period. Note that we do not wait for
UDP traffic since that's connectionless by design.
Creates an abstraction over UDP Conn for origin "connection" which can
be useful for future support of complex protocols that may require
changing ports during protocol negotiation (eg. SIP, TFTP)
In addition, it removes a dependency from ingress on connection package.
- Refactors some h2mux specific logic from connection/header.go to connection/h2mux_header.go
- Do the same for the unit tests
- Add a non-h2mux "is control response header" function (we don't need one for the request flow)
- In that new function, do not consider "content-length" as a control header
- Use that function in the non-h2mux flow for response (and it will be used also in origintunneld)
The default max streams value of 100 is rather small when subject to
high load in terms of connecting QUIC with streams faster than it can
create new ones. This high value allows for more throughput.
Go's client defaults to chunked encoding after a 200ms delay if the following cases are true:
* the request body blocks
* the content length is not set (or set to -1)
* the method doesn't usually have a body (GET, HEAD, DELETE, ...)
* there is no transfer-encoding=chunked already set.
So for non websocket requests, if transfer-encoding isn't chunked and content length is 0, we dont set a request body.
ServeControlStream accidentally became non-blocking in the last quic
change causing stream to not be returned until a SIGTERM was received.
This change makes ServeControlStream be non-blocking for QUIC streams.
This maximum grace period will be honored by Cloudflare edge such that
either side will close the connection after unregistration at most
by this time (3min as of this commit):
- If the connection is unused, it is already closed as soon as possible.
- If the connection is still used, it is closed on the cloudflared configured grace-period.
Even if cloudflared does not close the connection by the grace-period time,
the edge will do so.
time.Tick() does not get garbage collected because the channel
underneath never gets deleted and the underlying Ticker can never be
recovered by the garbage collector. We replace this with NewTicker() to
avoid this.
All header transformation code from h2mux has been consolidated in the connection package since it's used by both h2mux and http2 logic.
Exported headers used by proxying between edge and cloudflared so then can be shared by tunnel service on the edge.
Moved access-related headers to corresponding packages that have the code that sets/uses these headers.
Removed tunnel hostname tracking from h2mux since it wasn't used by anything. We will continue to set the tunnel hostname header from the edge for backward compatibilty, but it's no longer used by cloudflared.
Move bastion-related logic into carrier package, untangled dependencies between carrier, origin, and websocket packages.
added ingress.DefaultStreamHandler and a basic test for tcp stream proxy
moved websocket.Stream to ingress
cloudflared no longer picks tcpstream host from header