Jitter is important to avoid every cloudflared in the world trying to
reconnect at t=1, 2, 4, etc. That could overwhelm the backend. But
if each cloudflared randomly waits for up to 2, then up to 4, then up
to 8 etc, then the retries get spread out evenly across time.
On average, wait times should be the same (e.g. instead of waiting for
exactly 1 second, cloudflared will wait betweeen 0 and 2 seconds).
This is the "Full Jitter" algorithm from https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
Also changed the socks test code so that it binds to localhost, so that
we don't get popups saying "would you like to allow socks.test to use
the network"
Unless I'm mistaken, when there is no existing token for an app, the `login` command needs to be run to obtain a token (not the `token` command, which itself doesn't generate a token).
- Don't rely on edge to close connection on graceful shutdown in h2mux, start muxer shutdown from cloudflared.
- Don't retry failed connections after graceful shutdown has started.
- After graceful shutdown channel is closed we stop waiting for retry timer and don't try to restart tunnel loop.
- Use readonly channel for graceful shutdown in functions that only consume the signal
Classic tunnels flow was triggering an event for RegisteringTunnel for
every connection that was about to be established, and then a Connected
event for every connection established.
However, the RegistreringTunnel event had no connection ID, always
causing it to unset/disconnect the 0th connection making the /ready
endpoint report incorrect numbers for classic tunnels.
This change is focused on fixing rotating loggers in Windows
where it was failing due to Windows file semantics disallowing
the rotation while that file was still being open (because we
had multiple lumberjacks pointing to the same file).
This is fixed by ensuring the initialization happens only once.
This addresses a bug where logging would not be output when
cloudflared was run as a Windows Service.
That was happening because Windows Services have no stderr/out,
and the ConsoleWriter log was failing inside zerolog, which would
then not proceed to the next logger (the file logger).
We now overcome that by using our own multi writer that is resilient
to errors.
The following UInt flags:
* Uint64 - heartbeat-count, compression-quality
* Uint - retries, port, proxy-port
were not being correctly picked from the configuration YAML
since the multi origin refactor
This is due to a limitation of the ufarve library, which we
overcome for now with handling those as Int flags.
Not doing so was causing cloudflared to become stuck after
some time. This would happen because the Observer pattern
was sending events to the UI channel (that has 16 slots) but
no one was consuming those when the UI is not enabled (which
is the default case).
Hence, events (such as connection disconnect / reconnect) would
cause that buffer to be full and cause cloudflared to become
apparently stuck, in the sense that the connections would not be
reconnected.