Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack switching: Infrastructure and runtime support #10388

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

frank-emrich
Copy link
Contributor

This PR is part of a series that adds support for the Wasm stack switching proposal. The explainer document for the proposal is here. There's a tracking issue describing the overall progress and limitations here: #10248
The draft PR #10177 contains the whole (initial) implementation of the proposal.

This PR contains all of the necessary infrastructure and runtime support. In other words, this contains the entire implementation, except for codegen. Tests are added at the end, after codegen has also been added in a follow-up PR.

This was developed together with @dhil.

General implementation notes

In Wasm, continuations are represented by values of type (ref $ct), where $ct is a new composite type/heap type for continuations.
In the implementation, these are represented by values of type VMContObj. These are fat pointers, consisting of a sequence counter, and a pointer to a VMContRef. The latter type is used for the actual representation of continuations.
The sequence counter part of VMContObjs is used to check that every continuation value can only be used once.

The VMStoreContext is extended to contain a "stack chain": It indicates what stack we are currently executing on. Logically, this is a linked list of stacks, since each continuation has a parent field. The chain stored in the VMStoreContext always ends with a value representing the initial stack. This is the stack we were on when entering Wasm, which will usually be the main stack.

Memory Management

Currently, memory management is very basic: The StoreOpaque provides a new method for allocation a new VMContRef, and keeps track of all continuations created this way. Continuations are never deallocated at the moment, meaning that they live until the store itself is deallocated.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Backtrace generation

The existing backtrace generation mechanism is extended to be fully aware of continuations: When generating backtraces, the entire chain of continuation is traversed, not just the stack frames of the currently active stack/continuation.

Entering/Exiting Wasm

Prior to this PR, there were two separate mechanisms that save and restore some state of the runtime when entering/exiting Wasm:

  1. The functions enter_wasm and exit_wasm in func.rs
  2. CallThreadState saves and restores (on drop) parts of the VMStoreContext

This PR consolidates these two mechanism, because it requires some additional state to be updated and restored on enter/exit:
the type wasmtime::runtime::func::EntryStoreContext now stores all of the required runtime state and ensures that it's restored when exiting Wasm.

Tables

  • The runtime part for resizing and filling tables containing continuations is added.
  • Note that there is a potentially controversial change to how the pooling allocator works for tables: Since a VMContObj is 2 pointers wide, the maximum size of an entry in a table doubles. As a result, this PR doubles the amount of virtual memory space occupied by the table pool (but the amount of pages actually used stays the same).

@frank-emrich frank-emrich requested review from a team as code owners March 12, 2025 22:44
@frank-emrich frank-emrich requested review from fitzgen and removed request for a team March 12, 2025 22:44
@github-actions github-actions bot added fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API. wasmtime:config Issues related to the configuration of Wasmtime wasmtime:ref-types Issues related to reference types and GC in Wasmtime labels Mar 12, 2025
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:c-api", "wasmtime:config", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing, wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

  • If you added a new Config method, you wrote extensive documentation for
    it.

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • If you added a new Config method, or modified an existing one, you
    ensured that this configuration is exercised by the fuzz targets.

    For example, if you expose a new strategy for allocating the next instance
    slot inside the pooling allocator, you should ensure that at least one of our
    fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this
    configuration option in wasmtime_fuzzing::Config (or one
    of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this
    configuration. See our docs on fuzzing for more details.

  • If you are enabling a configuration option by default, make sure that it
    has been fuzzed for at least two weeks before turning it on by default.


To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

@fitzgen
Copy link
Member

fitzgen commented Mar 13, 2025

Haven't dug into the details of the PR yet, but I have some initial questions after reading the OP.

The stack memory used by each allocation (represented by type ContinuationStack) is always mmap-ed, with a guard page at the end. There is currently no support for growing stacks, or segmented stacks.

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via store.engine().allocator().[de]allocate_fiber_stack()? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

Note that there is a potentially controversial change to how the pooling allocator works for tables: Since a VMContObj is 2 pointers wide, the maximum size of an entry in a table doubles. As a result, this PR doubles the amount of virtual memory space occupied by the table pool (but the amount of pages actually used stays the same).

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(review is still in progress, but I need to context switch for a little bit and might as well share what I have so far)

Comment on lines +75 to +77
WASM_API_EXTERN void
wasmtime_linker_allow_unknown_exports(wasmtime_linker_t *linker,
bool allow_unknown_exports);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a corresponding function definition in crates/c-api, I think it is missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a stray declaration from some of our downstream experiments. It has been needed to capture the C shadow stack pointer, that clang generates. It must be kept in sync with the stack switching done by the engine. To be clear, this is just due to the fact that clang doesn't know about stack switching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, exposing these things the C API would be good in general, so if you want to do it here or in a new PR, you're more than welcome to!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure! I am leaning towards adding this in a follow-up PR focused on the C API. What do you think @frank-emrich ?

Comment on lines +253 to +259
/**
* \brief Configures whether the WebAssembly function references
* proposal is enabled.
*
* This setting is `false` by default.
*/
WASMTIME_CONFIG_PROP(void, wasm_function_references, bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I don't see a function definition for this one either.

Copy link
Contributor

@dhil dhil Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above. Should just be deleted for now, I think.

Comment on lines +334 to +335
/// Size of stacks created with cont.new instructions
pub stack_switching_stack_size: Option<usize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for now, I'm not sure it is worth having separate config knobs for stacks created via cont.new versus API-level fibers. This effectively creates two different kinds of stacks in the system, and I'd prefer to try and keep things simple for as long as we can by having a single kind of stack that we config and allocate and handle in the system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question, which Frank and I have discussed in the past. If my memory serves me right, then our layout is slightly different. However perhaps we can unify the layouts (or we can perhaps simply adopt the fibers' layout). I am not too sure about the implications, I think @frank-emrich has the key knowledge to best answer this question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automatic pooling allocator integration would also get you fast allocation of new stacks from a pool, which would look nice on your benchmarks and what have you ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed! I totally see the appeal!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a technical perspective, there is no need for us to have our own config setting for the stack size.
I guess the reason we didn't just re-use the existing stack_size option is that the latter always felt like a soft limit to me: AFAIK you can still occupy stack space beyond that limit, for example with host functions.
But I have no strong feelings about re-using the existing stack_size config option, or merging our stack_switching_stack_size with async_stack_size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK you can still occupy stack space beyond that limit, for example with host functions.

We are generally a little loosey-goosey with our nomenclature here, but to make things precise for a moment:

  • "Host functions" = functions defined in native code by the Wasmtime embedder (e.g. via wasmtime::Func::new) and imported by Wasm
  • "libcalls" = various bits of Wasm functionality (e.g. memory.grow) that are not implemented inline in JIT code, but instead in native functions inside of Wasmtime's runtime.

libcalls are part of Wasmtime's trusted compute base, host functions are not (modulo some hand waving around their usage of unsafe).

Given all that: we will execute libcalls on async fibers, but will never execute host functions on async fibers. When Wasm calls an imported host function, we first switch from the async fiber to the native stack, then run the host function, and finally switch back to the Wasm's async fiber stack to pass the function results back to Wasm JIT code.

A libcall should never access stack memory beyond the async_stack_size limit (and we rely on guard pages to help enforce this). The Wasm JIT code is given max_wasm_stack bytes for its own use, it must always be true that async_stack_size > max_wasm_stack, and therefore that leaves async_stack_size - max_wasm_stack bytes of stack space for libcalls in the worst case.

This is all a little bit of an aside and I'm mostly just explaining all this to make sure we are on the same page here. I don't think any of this needs to change for stack-switching, and I would also expect that we would call libcalls, but never host functions, on stack-switching stacks, same way we do things on our async fibers.

Comment on lines +152 to +155
// TODO(#10248) GC integration for stack switching
return Err(wasmtime_environ::WasmError::Unsupported(
"Stack switching feature not compatbile with GC, yet".to_string(),
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something that needs to be handled in this PR, but there is no reason we can't enable GC and also maintain the existing memory management for continuations, where they have the same lifetime as the Store. We don't have to implement full GC of continuations before we allow enabling both GC and stack switching at the same time.

Comment on lines +68 to +69
pub const TRAP_DELETE_ME_DEBUG_ASSERTION: TrapCode =
TrapCode::unwrap_user(Trap::DeleteMeDebugAssertion as u8 + TRAP_OFFSET);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we already have TRAP_INTERNAL_ASSERT that we use for debug-only assertions inside JIT code.

Comment on lines +213 to +222
// FIXME(frank-emrich) The next three builtins are used by the debug printing mechanism.
// They are not supposed to be part of the final upstreamed code.
//
// Prints a 'static str, represented as a
// pointer and a length.
delete_me_print_str(vmctx: vmctx, s: pointer, len : u64);
// Prints integer
delete_me_print_int(vmctx: vmctx, arg : u64);
// Prints pointer, formatted as hex.
delete_me_print_pointer(vmctx: vmctx, arg : pointer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these going to be removed in the next PR, or should they be removed now? Maybe throw an item into the meta task list if not now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take is that they shouldn't make it into the main branch -- probably a stray include?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! These only exist for some hacky debug printing code that should not end up in main. Initially, when we thought that there would only be a single, big PR, the plan was to keep them during reviewing, and then remove them at the last moment before merging. Now that we will land this in smaller pieces, we could either remove them now, or once all PRs from the series have landed.

The reason why we don't want to land this is explained here: On the codegen side, we just emit the addresses of Rust string literals as literals into generated code, with no relocation information. So this crashes and burns if you ever try to re-use the same generated code from a different invocation of wamtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we don't want to land this is explained here: On the codegen side, we just emit the addresses of Rust string literals as literals into generated code, with no relocation information. So this crashes and burns if you ever try to re-use the same generated code from a different invocation of wamtime.

Ah okay! Yes, definitely should not land these bits in that case.

Comment on lines +4 to +6
/// FIXME(frank-emrich) Will remove in the final upstreamed version
#[allow(dead_code, reason = "Only accessed in debug builds")]
pub const ENABLE_DEBUG_PRINTING: bool = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto to the debug printing stuff in this file.

Comment on lines +8 to +18
/// FIXME(frank-emrich) Will remove in the final upstreamed version
#[macro_export]
macro_rules! debug_println {
($( $args:expr ),+ ) => {
#[cfg(debug_assertions)]
if ENABLE_DEBUG_PRINTING {
#[cfg(feature = "std")]
println!($($args),*);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I tend to do log::trace!(...) for debug prints (both temporary ones that I will remove, and for ones that are useful enough to keep in tree).

when running the CLI, you can see them by setting the WASMTIME_LOG=trace env var, or only a specific crate/module via WASMTIME_LOG=wasmtime::runtime::vm=trace, or multiple crates/modules at different levels via WASMTIME_LOG=debug,regalloc2=info,cranelift_codegen::machinst=trace.

When running tests instead of the CLI, WASMTIME_LOG becomes RUST_LOG. Most tests initialize the env-logger, but if you aren't seeing the logs, then you can put let _ = env_logger::try_init(); at the start of the test (or ideally in a shared test helper function that a bunch of related tests already call at the beginning of their execution).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again, review still in progress, leaving intermediate notes)

Comment on lines +20 to +28
/// Runtime configuration options for stack switching that can be set
/// via the command line.
///
/// Part of wasmtime::config::Config type (which is not in scope in this crate).
#[derive(Debug, Clone)]
pub struct StackSwitchingConfig {
/// The (fixed) size of a continuation stack.
pub stack_size: usize,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sorts of config knobs that need to exist at the wasmtime_environ level usually go inside wasmtime_environ::Tunables. Unless there is a reason not to do that here, we should follow the existing convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I discussed with @alexcrichton (https://github.com/bytecodealliance/wasmtime/pull/10177/files#r1965733715), I'd be OK with no bespoke stack switching configuration for now.

Comment on lines +661 to +705
/// Encodes the life cycle of a `VMContRef`.
#[derive(Debug, Clone, Copy, PartialEq)]
#[repr(u32)]
pub enum VMStackState {
/// The `VMContRef` has been created, but neither `resume` or `switch` has ever been
/// called on it. During this stage, we may add arguments using `cont.bind`.
Fresh = wasmtime_environ::stack_switching::STACK_STATE_FRESH_DISCRIMINANT,
/// The continuation is running, meaning that it is the one currently
/// executing code.
Running = wasmtime_environ::stack_switching::STACK_STATE_RUNNING_DISCRIMINANT,
/// The continuation is suspended because it executed a resume instruction
/// that has not finished yet. In other words, it became the parent of
/// another continuation (which may itself be `Running`, a `Parent`, or
/// `Suspended`).
Parent = wasmtime_environ::stack_switching::STACK_STATE_PARENT_DISCRIMINANT,
/// The continuation was suspended by a `suspend` or `switch` instruction.
Suspended = wasmtime_environ::stack_switching::STACK_STATE_SUSPENDED_DISCRIMINANT,
/// The function originally passed to `cont.new` has returned normally.
/// Note that there is no guarantee that a VMContRef will ever
/// reach this status, as it may stay suspended until being dropped.
Returned = wasmtime_environ::stack_switching::STACK_STATE_RETURNED_DISCRIMINANT,
}

/// Universal control effect. This structure encodes return signal, resume
/// signal, suspension signal, and the handler to suspend to in a single variant
/// type. This instance is used at runtime. There is a codegen counterpart in
/// `cranelift/src/stack-switching/control_effect.rs`.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
#[repr(u32)]
#[allow(dead_code)]
pub enum ControlEffect {
/// Used to signal that a continuation has returned and control switches
/// back to the parent.
Return = wasmtime_environ::stack_switching::CONTROL_EFFECT_RETURN_DISCRIMINANT,
/// Used to signal to a continuation that it is being resumed.
Resume = wasmtime_environ::stack_switching::CONTROL_EFFECT_RESUME_DISCRIMINANT,
/// Used to signal that a continuation has invoked a `suspend` instruction.
Suspend {
/// The index of the handler to be used in the parent continuation to
/// switch back to.
handler_index: u32,
} = wasmtime_environ::stack_switching::CONTROL_EFFECT_SUSPEND_DISCRIMINANT,
/// Used to signal that a continuation has invoked a `suspend` instruction.
Switch = wasmtime_environ::stack_switching::CONTROL_EFFECT_SWITCH_DISCRIMINANT,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because (AFAICT) these don't contain any usizes or pointers or anything else that changes with the target's word size, you could move them into the wasmtime_environ crate as

#[repr(u32)]
pub enum VMControlEffect {
    Return = 0,
    // ...
}

and avoid the wordy and kind of annoying split between the definitions of the enums and their discriminants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, these were in wasmtime_environ originally, but then @alexcrichton suggested moving them into the runtime. I don't mind it either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, lets keep it as-is for now then. I don't want to make you bounce back and forth.

Comment on lines +236 to +237
/// TODO
pub fn unwrap_ref_type(&self) -> WasmRefType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to fill this in.

Comment on lines +813 to +814
/// Returns the (module interned) index to the underlying function type.
pub fn unwrap_interned_type_index(self) -> ModuleInternedTypeIndex {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: both the Engine and Module variants are interned, so perhaps we can rename this method to unwrap_module_type_index?

Comment on lines +167 to +171
/// This is the size of the largest value type (i.e. a V128).
#[inline]
fn maximum_value_size(&self) -> u8 {
self.size_of_vmglobal_definition()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused. Does it get used in follow up PRs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it gets used during the code translation. It should arguably be added along with it in the next patch.

Comment on lines +312 to +315
stack-switching = [
"std",
"runtime"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Why does stack-switching need std?

  2. We should be able to enable stack-switching for a compiler-without-runtime build to be able to compile stack-switching Wasm to be run elsewhere, and similarly should be able to enable stack-switching in a runtime-without-compiler build to run elsewhere-compiled stack-switching Wasm modules. As written, enabling the stack-switching feature will always enable the runtime, making the compiler-without-runtime version impossible. That is, unless compilation of stack-switching isn't gated on any cargo feature, but that is kind of funky given that this cargo feature exists and I expect users would think "I want to compile Wasm programs that use stack-switching, therefore I will enable this feature". This is a long way of basically saying, I think this should not unconditionally enable the runtime and, once compilation is added in the next PR, we ultimately end up with something like this:

    stack-switching = [
      "wasmtime-cranelift?/stack-switching",
    ]

Neither of these things necessarily needs to be addressed now, before this PR merges, but it may end up being easier doing it now before more stuff gets entangled, making it harder to pull the cfgs apart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also want to add some CI checks for these features being enabled/disabled in various combinations like this:

https://github.com/bytecodealliance/wasmtime/blob/main/.github/workflows/main.yml#L363-L365

Comment on lines +193 to +194
// TODO(#10248) Required to support stack switching in the embedder API.
unimplemented!()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an item to the meta task list for the embedder API, if you haven't already?

Comment on lines +1603 to +1606
// The `enter_wasm` call below will reset the store's `stack_chain` to
// a new `InitialStack`, pointing to the stack-allocated
// `initial_stack_csi`.
let mut initial_stack_csi = VMCommonStackInformation::running_default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the function call microbenchmarks on main and on this branch? I want to make sure that we aren't regressing performance here, as this path is very hot, and if we are we may need to figure out how to cfg(...) some of this stuff so that it only happens when the stack-switching feature is enabled.

This will run the microbenchmarks and will check the results for statistical significance:

$ git checkout main
$ cargo bench --bench call -- --save-baseline main
$ git checkout stack-switching-infra
$ cargo bench --bench call -- --baseline main

I can help figure things out if we need to do something here.

Comment on lines +1616 to +1617
let result = crate::runtime::vm::catch_traps(store, &previous_runtime_state, closure);
core::mem::drop(previous_runtime_state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mem::drop dance is kind of funky -- can we instead pass ownership into catch_traps and let it handle the dropping, if necessary? Or are there callers that need to do different things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that drop call is definitely not pretty. We could get away with not explicitly calling drop, but managing the scopes here slightly differently. But that seems equally funky.

The main reason for all this new EntryStoreContext business is the following: Previously, there were two places that do some updates to the runtime on entering and exiting Wasm: In catch_traps (together with CallThreadState's drop) and here in func.rs (using enter_wasm/exit_wasm). I believe @alexcrichton mentioned that this split is mostly for legacy reasons due to how to things were separated into different crates until recently.

For stack switching, we now require an additional update to the runtime on entering and exiting. Intuitively, it felt to me like these changes should be made in enter_wasm/exit_wasm, but that doesn't quite work: We need to undo these changes if you panic out of a Wasm execution, so I would indeed need to do all of this in catch_traps instead, so that CallThreadState's drop can do the steps needed on exiting the runtime.

But updating the stack chain in catch_traps to signal that we are entering Wasm seemed awkward to me, so I decided to just consolidate all the logic for changes that need to happen on entering and exiting Wasm into one place, which resulted in the birth of EntryStoreContext.

TL;DR Technically, all that's needed for stack switching could all be done in catch_traps, but that felt like it's morally the wrong spot to me.

Any suggestions? Should I just move it to catch_traps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for all this new EntryStoreContext business is the following: Previously, there were two places that do some updates to the runtime on entering and exiting Wasm: In catch_traps (together with CallThreadState's drop) and here in func.rs (using enter_wasm/exit_wasm).

...

Any suggestions? Should I just move it to catch_traps?

Do you mind splitting out just the EntryStoreContext bits (without even the stack chain preservation) into its own PR? I think that would help me get a better grasp of these refactorings and provide better review and feedback, especially since the wasm entry/exit paths are both very fiddly and also perf sensitive.

I do like the idea of unifying these code paths a lot, but I just want to be very careful about changes to them in general. Honestly, it is fairly likely that we won't change the mechanisms any further than what you have already done in this PR here, but it would help a lot with alleviating my nibbling paranoia that we are introducing subtle bugs or perf regressions.

And I apologize for asking you to keep splitting these PRs even finer-grained. I know it can be a pain (again, see #10503) but it is super super super helpful for reviewers and making sure that we are following along with all the changes. Thanks!

Comment on lines +1628 to +1629
// FIXME(frank-emrich) Do the fields in here need to be (Unsafe)Cells?
pub struct EntryStoreContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are written to directly from Wasm JIT code, then probably yes. This struct should also be #[repr(C)] in that case.

If it is not written to directly from Wasm, then it would only need to be unsafe cells if you are ever violating Rust's mutable/exclusive xor immutable/shared semantics.

I haven't read enough of these changes to say any more than that yet.

@frank-emrich
Copy link
Contributor Author

(Sorry for the long delay, I've moved to London and started a new job, where I needed to settle in first)

@fitzgen

Is there a particular reason this does not use the InstanceAllocatorImpl::[de]allocate_fiber_stack methods via store.engine().allocator().[de]allocate_fiber_stack()? That would make it so that this feature automatically integrates with the wasmtime::Config, its various knobs for stack sizes and such, and whether the pooling or on-demand allocator is in use.

The reason we do not use these in verbatim is that our stacks are not the same as wasmtime_fiber::FiberStack. Trying to make our stacks compatible with that type would be quite difficult and IMO not worth the effort.
However, our medium-term plan was to add similar allocation and deallocation functions to InstanceAllocatorImpl. This shouldn't be too much work, but something we didn't see through for now. In particular, @dhil is very interested in restoring support for pooled allocation of stack. We had implemented our own interface for that (outside of InstanceAllocatorImpl) and it greatly improves performance in those use cases where you frequently (de-) allocate continuations.

Is this because (ref cont)/VMContObj tables are being shoe-horned into the same underlying representation as anyref/externref/VMGcRef tables? Because VMContObj has a different representation from VMGcRef, I think it would be best to add a new wasmtime::runtime::vm::Table variant for VMContObj, the same way that we have different variants for VMGcRef tables versus *const VMFuncRef tables.

We actually extend StaticTable and TableElement in that file with appropriate variants for our types. The problem is that in TablePool::new we need to mmap enough memory to hold limits.total_tables * limits.table_elements entries, independent from what data is stored in those tables. In order to support tables storing continuations pointers, this effectively doubles the required size of the mmapped area.

@fitzgen
Copy link
Member

fitzgen commented Apr 7, 2025

(Sorry for the long delay, I've moved to London and started a new job, where I needed to settle in first)

No worries. Congrats on the move and the new gig!

The reason we do not use these in verbatim is that our stacks are not the same as wasmtime_fiber::FiberStack. Trying to make our stacks compatible with that type would be quite difficult and IMO not worth the effort.

I'd like to dive more into this because I would (perhaps naively?) assume that there is no fundamental reason they couldn't be the same. This is also one of the biggest concerns I have about the the stack-switching implementation: the complexity of having multiple kinds of stacks that the runtime must manage; when there are different kinds of resources, we will eventually need different config knobs for tuning them; and etc. Basically, this feels like a bunch of cognitive overhead for both maintainers and users. Keeping things uniform keeps things simpler. For example, #9350 (which is admittedly a beast of a PR; I have been splitting it into smaller pieces and landing those incrementally) will ultimately reduce complexity in Wasmtime by unifying our handling of Wasm linear memories and GC heaps. I just point this out so that you know I'm not trying to pick on you, and that I am attempting to hold myself to these standards as well.

But perhaps I am missing some key point that makes it so that these stack-switching stacks and fiber stacks really do have different requirements that do not overlap enough that sharing an implementation makes sense?

You say it would be quite difficult to unify them, can you expand on what difficulties your foresee?

In particular, @dhil is very interested in restoring support for pooled allocation of stack. We had implemented our own interface for that (outside of InstanceAllocatorImpl) and it greatly improves performance in those use cases where you frequently (de-) allocate continuations.

Totally, that's exactly what I'd expect: a free list pop should be much faster than mmap, especially under load in a concurrent, multi-threaded embedding of Wasmtime.

We actually extend StaticTable and TableElement in that file with appropriate variants for our types. The problem is that in TablePool::new we need to mmap enough memory to hold limits.total_tables * limits.table_elements entries, independent from what data is stored in those tables. In order to support tables storing continuations pointers, this effectively doubles the required size of the mmapped area.

Ah okay, I get it now. You're right that we definitely don't want to cut the capacity of all existing tables in half in order to support a proposal that 99% of users won't be leveraging yet.

Two things:

  1. And this is a pre-existing issue, that calculation of table size should be a little more robust, and use something like mem::size_of::<TableElementUnion>() so that it is more future-proof for changes like we have here where we are adding new kinds of table elements.
  2. We could plumb through a boolean for whether the stack-switching proposal is enabled to this method, or the whole WasmFeatures or something, and then adjust the size of a single element in the table size calculation based on that value.

In the meantime, halving the capacity just for continuation tables (rather than all other existing tables) is fine. And we can do the above suggestions in follow up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzzing Issues related to our fuzzing infrastructure wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:c-api Issues pertaining to the C API. wasmtime:config Issues related to the configuration of Wasmtime wasmtime:ref-types Issues related to reference types and GC in Wasmtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants