Recently python.org started blocking our requests. We've asked the Devtools team to upgrade the default python installation to 3.10 so that we can use it in our tests
cloudflared_udp_total_sessions was incorrectly a gauge when it
represents the total since the cloudflared process started and will
only ever increase.
Additionally adds new ICMP metrics for requests and replies.
Adds new suite of metrics to capture the following for capnp rpcs operations:
- Method calls
- Method call failures
- Method call latencies
Each of the operations is labeled by the handler that serves the method and
the method of operation invoked. Additionally, each of these are split
between if the operation was called by a client or served.
Since legacy tunnels have been removed for a while now, we can remove
many of the capnp rpc interfaces that are no longer leveraged by the
legacy tunnel registration and authentication mechanisms.
A clock structure was used to help support unit testing timetravel
but it is a globally shared object and is likely unsafe to share
across tests. Reordering of the tests seemed to have intermittent
failures for the TestWaitForBackoffFallback specifically on windows
builds.
Adjusting this to be a shim inside the BackoffHandler struct should
resolve shared object overrides in unit testing.
Additionally, added the reset retries functionality to be inline with
the ResetNow function of the BackoffHandler to align better with
expected functionality of the method.
Removes unused reconnectCredentialManager.
To help support temporary errors that can occur in the capnp rpc
calls, a wrapper is introduced to inspect the error conditions and
allow for retrying within a short window.
Combines the tunnelrpc and quic/schema capnp files into the same module.
To help reduce future issues with capnp id generation, capnpids are
provided in the capnp files from the existing capnp struct ids generated
in the go files.
Reduces the overall interface of the Capnp methods to the rest of
the code by providing an interface that will handle the quic protocol
selection.
Introduces a new `rpc-timeout` config that will allow all of the
SessionManager and ConfigurationManager RPC requests to have a timeout.
The timeout for these values is set to 5 seconds as non of these operations
for the managers should take a long time to complete.
Removed the RPC-specific logger as it never provided good debugging value
as the RPC method names were not visible in the logs.
If cloudflared was unable to register the UDP session with the
edge, the socket would be left open to be eventually closed by the
OS, or garbage collected by the runtime. Considering that either of
these closes happened significantly after some delay, it was causing
cloudflared to hold open file descriptors longer than usual if continuously
unable to register sessions.
## Summary
We discovered that we were being impacted by a bug in quic-go,
that could create deadlocks and not close connections.
This commit bumps quic-go to the version that contains the fix
to prevent that from happening.
Before this commit the commands that listed tunnels and tunnel routes would be limited to 1000 results by the server.
Now, the commands will call the endpoints until the result set is exhausted. This can take a long time if there are
thousands of pages available, since each request is executed synchronously.
From a user's perspective, nothing changes.
## Summary:
In order to properly monitor what is happening with the new write timeouts that we introduced
in TUN-8244 we need proper logging. Right now we were logging write timeouts when the safe
stream was being closed which didn't make sense because it was miss leading, so this commit
prevents that by adding a flag that allows us to know whether we are closing the stream or not.
## Summary
To avoid having to verbose logs we need to only log when an
actual issue occurred. Therefore, we will be skipping any error
logging if the write timeout is caused by no network activity
which just means that nothing is being sent through the stream.
This commit makes the remote diagnostics enabled by default, which is
a useful feature when debugging cloudflared issues without manual intervention from users.
Users can still opt-out by disabling the feature flag.
Propagates the logger context into further locations to help provide more context for certain errors. For instance, upstream and downstream copying errors will properly have the assigned flow id attached and destination address.