rfc: reducing "runtime not found" confusion

Context

Tokio's resource types (TcpStream, time::delay, ...) require a runtime to function. Users interact with the resource types by awaiting on results. The runtime receives events from the operating system related to these resources and dispatches the events to the appropriate waiting task.

Tokio does not implicitly spawn runtimes. It is the user's responsibility to ensure a runtime is running. This is usually done using #[tokio::main] but creating a runtime manually is also possible.

Multiple runtime flavors are provided. There is a multi-threaded, work-stealing, runtime. This runtime spawns multiple threads. It is the recommended runtime for a number of cases, including network server applications. There is also a single-threaded "in-place" runtime. This runtime spawns no threads and is useful for cases like implementing a blocking interface for an async client. This is how the reqwest crate's blocking module works.

Additionally, there are cases in which it is useful to have a single process start multiple runtimes of the same flavor. For example, linkerd currently uses a pair of single-threaded runtimes, one for data-plane forwarding work and the other for control-plane tasks (serving metrics, driving service discovery lookups, and so on). This helps keep the two workloads isolated so neither can starve the other, and allows the use of !Sync futures that avoid the overhead of synchronization, without making the entire application single-threaded. The ability to run multiple runtimes in the same process is an important feature for many use-cases.

The problem

When using a TcpStream, the resource type must reference a runtime in order to function. Given that Tokio may have any number of runtimes running in the current process, the resource type must have some strategy by which it can select the correct runtime.

Currently, this is done by using a thread-local to track the current Runtime. In many cases, a process only includes a single runtime. A problem arises when attempting to use a resource from outside of a runtime. In this case, it is unclear which runtime the resource type should use and Tokio will panic!.

use tokio::net::TcpStream;

fn main() {
    // Boom, no runtime.
    let future = TcpStream::connect("www.example.com");
}

The strategy for fixing this is to enter a runtime context using a runtime::Handle.

use tokio::net::TcpStream;
use tokio::runtime::Handle;

fn do_some_work(handle: &Handle) {
    handle.enter(|| {
      let future = TcpStream::connect("www.example.com");
    });
}

This panic is the source of confusion for users who are not aware of how Tokio searches for the runtime.

History

In the very early days, tokio-core always required an explicit Handle. There was no context, thread-local or global, that stored the "current reactor". This resulted in Handle being a field on virtually every single type or an argument to every single function. This was tedious given that, in most cases, there was only ever a single tokio-core reactor in the process. In some cases, it resulted in measurable performance degradation as the Handle field increased struct size.

Because of this, tokio-core started providing a static runtime. Resource types would default to using this static runtime. Resource types also included method variants that took an explicit &Handle, allowing the user to specify a custom runtime.

The primary problem with a static runtime is that it cannot easily be configured. However, all users who ended up configuring their runtime were forced to use the more verbose APIs with an explicit &Handle argument. Additionally, some libraries did not provide methods with an explicit &Handle argument, preventing them from being used with custom runtimes.

To solve these problems, Tokio added a thread-local tracking the "current" runtime. Now, resources would first check the thread-local and if it was not set, it would use the global runtime. This introduced a new problem. Users that intended to use a custom runtime would accidentally use their resource types from outside of their custom runtime, which would start the global runtime and shift their parts application to the global runtime. The worst part of this is everything "seemed to work" but was not doing what the user intended. Half the application ran on a static runtime with default configuration and the other half on the configured runtime. Usually, nothing was noticed until poor performance was noticed in production.

The final iteration, resulting in the Tokio of today, was to remove the concept of the global runtime in favor of the thread-local context. This prevents users from accidentally being shifted to the global runtime and things appear to work, but are in a degraded state. The consequence of this change is that attempting to use Tokio's resource types from outside of a runtime results in a panic.

Options

There are a few ways forward from here. These options are not mutually exclusive. This issue is to discuss ways forward. Feel free to propose alternate strategies as well.

Change no behavior and improve the panic message

The thread-local context logic can remain unchanged. Instead, the panic message is improved to include more context about the problem and some options for fixing it.

Re-introduce a static runtime using a feature flag

In this case, a static runtime is re-introduced. However, it is guarded by a feature flag: rt-global. rt-global would also be included in the full meta feature.

When rt-global is enabled, Tokio resources would first check the thread-local context. If it is set, the current runtime is used. If there isn't one set, then the global runtime is used.

The primary danger here is silently ending up with the "split application". Since feature flags are additive, a library or component may include the rt-global feature flag and the application does not know it is accidentally using the global runtime.

Provide a separate `tokio-global-runtime`crate

The specific name of the crate would need to be massaged. The idea is to have a separate crate define the static variable. In this case, users who wish to use a statically defined runtime would depend on this crate.

The main downside as I see it to this is that it makes the global runtime less discoverable.

Re-introduce `&Handle` method variants

In this case, all async methods include a variant that takes an explicit &runtime::Handle. Users who want to ensure a runtime exists and it is the correct runtime may opt to be explicit about the runtime used by specifying it.

This doesn't really solve the problem that users are confused when calling TcpStream::connect panics as it requires them knowing they must call the &runtime::Handle variant. However, improvements to the panic message would include mention of this strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Context

The problem

History

Options

Change no behavior and improve the panic message

Re-introduce a static runtime using a feature flag

Provide a separate `tokio-global-runtime`crate

Re-introduce `&Handle` method variants

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Context

The problem

History

Options

Change no behavior and improve the panic message

Re-introduce a static runtime using a feature flag

Provide a separate tokio-global-runtimecrate

Re-introduce &Handle method variants

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Provide a separate `tokio-global-runtime`crate

Re-introduce `&Handle` method variants