Whenever cloudflared receives a SIGTERM or SIGINT it goes into graceful shutdown mode, which unregisters the connection and closes the control stream. Unregistering makes it so we no longer receive any new requests and makes the edge close the connection, allowing in-flight requests to finish (within a 3 minute period).
This was working fine for http2 connections, but the quic proxy was cancelling the context as soon as the controls stream ended, forcing the process to stop immediately.
This commit changes the behavior so that we wait the full grace period before cancelling the request
Revert "TUN-8621: Fix cloudflared version in change notes."
Revert "PPIP-2310: Update quick tunnel disclaimer"
Revert "TUN-8621: Prevent QUIC connection from closing before grace period after unregistering"
Revert "TUN-8484: Print response when QuickTunnel can't be unmarshalled"
Revert "TUN-8592: Use metadata from the edge to determine if request body is empty for QUIC transport"
Whenever cloudflared receives a SIGTERM or SIGINT it goes into graceful shutdown mode, which unregisters the connection and closes the control stream. Unregistering makes it so we no longer receive any new requests and makes the edge close the connection, allowing in-flight requests to finish (within a 3 minute period).
This was working fine for http2 connections, but the quic proxy was cancelling the context as soon as the controls stream ended, forcing the process to stop immediately.
This commit changes the behavior so that we wait the full grace period before cancelling the request
Since legacy tunnels have been removed for a while now, we can remove
many of the capnp rpc interfaces that are no longer leveraged by the
legacy tunnel registration and authentication mechanisms.
A clock structure was used to help support unit testing timetravel
but it is a globally shared object and is likely unsafe to share
across tests. Reordering of the tests seemed to have intermittent
failures for the TestWaitForBackoffFallback specifically on windows
builds.
Adjusting this to be a shim inside the BackoffHandler struct should
resolve shared object overrides in unit testing.
Additionally, added the reset retries functionality to be inline with
the ResetNow function of the BackoffHandler to align better with
expected functionality of the method.
Removes unused reconnectCredentialManager.
Combines the tunnelrpc and quic/schema capnp files into the same module.
To help reduce future issues with capnp id generation, capnpids are
provided in the capnp files from the existing capnp struct ids generated
in the go files.
Reduces the overall interface of the Capnp methods to the rest of
the code by providing an interface that will handle the quic protocol
selection.
Introduces a new `rpc-timeout` config that will allow all of the
SessionManager and ConfigurationManager RPC requests to have a timeout.
The timeout for these values is set to 5 seconds as non of these operations
for the managers should take a long time to complete.
Removed the RPC-specific logger as it never provided good debugging value
as the RPC method names were not visible in the logs.
## Summary
To prevent bad eyeballs and severs to be able to exhaust the quic
control flows we are adding the possibility of having a timeout
for a write operation to be acknowledged. This will prevent hanging
connections from exhausting the quic control flows, creating a DDoS.
Also update golang.org/x/net and google.golang.org/grpc to fix vulnerabilities,
although cloudflared is using them in a way that is not exposed to those risks
This commit implements the option to disable PTMU discovery for QUIC
connections.
QUIC finds the PMTU during startup by increasing Ping packet frames
until Ping responses are not received anymore, and it seems to stick
with that PMTU forever.
This is no problem if the PTMU doesn't change over time, but if it does
it may case packet drops.
We add this hidden flag for debugging purposes in such situations as a
quick way to validate if problems that are being seen can be solved by
reducing the packet size to the edge.
Note however, that this option may impact UDP proxying since we expect
being able to send UDP packets of 1280 bytes over QUIC.
So, this option should not be used when tunnel is being used for UDP
proxying.
I deliberately kept this as an unregistertimeout because that was the
intent. In the future we could change this to a UDPConnConfig if we want
to pass multiple values here.
The idea of this PR is simply to add a configurable unregister UDP
timeout.
The lucas-clemente/quic-go package moved namespaces and our branch
went stale, this new fork provides support for the new quic-go repo
and applies the max datagram frame size change.
Until the max datagram frame size support gets upstreamed into quic-go,
this can be used to unblock go 1.20 support as the old
lucas-clemente/quic-go will not get go 1.20 support.
This PR adds ApplicationError as one of the "try_again" error types for
startfirstTunnel. This ensures that these kind of errors (which we've
seen occur when a tunnel gets rate-limited) are retried.
Going forward, the only protocols supported will be QUIC and HTTP2,
defaulting to QUIC for "auto". Selecting h2mux protocol will be forcibly
upgraded to http2 internally.
This PR does two things:
It changes how we fallback to a lower protocol: The current state
is to try connecting with a protocol. If it fails, fall back to a
lower protocol. And try connecting with that and so on. With this PR,
if we fail to connect with a protocol, we will try to connect to other
edge addresses first. Only if we fail to connect to those will we
fall back to a lower protocol.
It fixes a behaviour where if we fail to connect to an edge addr,
we keep re-trying the same address over and over again.
This PR now switches between edge addresses on subsequent connecton attempts.
Note that through these switches, it still respects the backoff time.
(We are connecting to a different edge, but this helps to not bombard an edge
address with connect requests if a particular edge addresses stops working).
This PR changes protocol initialization of the other N connections to be
the same as the one we know the initial tunnel connected with. This is
so we homogenize connections and not lead to some connections being
QUIC-able and the others not.
There's also an improvement to the connection registered log so we know
what protocol every individual connection connected with from the
cloudflared side.
Remove send and return methods from Funnel interface. Users of Funnel can provide their own send and return methods without wrapper to comply with the interface.
Move packet router to ingress package to avoid circular dependency
printing `seconds` is superfluous since time.Duration already adds the `s` suffix
Invalid log message would be
```
Retrying connection in up to 1s seconds
```
Co-authored-by: João Oliveirinha <joliveirinha@cloudflare.com>
Previously allowing the reconnect signal forcibly close the connection
caused a race condition on which error was returned by the errgroup
in the tunnel connection. Allowing the signal to return and provide
a context cancel to the connection provides a safer shutdown of the
tunnel for this test-only scenario.