Design: Priority: JavaScript-integrated GC or primitives for implementing GC in WASM

Created on 26 Jul 2016 · 25Comments · Source: WebAssembly/design

My primary concern with WebAssembly right now is not anything in the current spec, but rather that the design anticipates the addition of GC reference types. I think this is an unnecessary coupling of WebAssembly to existing JavaScript engines.

I can understand the motivation for adding GC references, and that the cost doesn't seem so bad to most of you who are implementing runtimes within existing JavaScript engines. But I'm sure you can see that if GC reference types are a de facto requirement, that limits the use of WebAssembly outside of browsers.

Even if it is added as an optional extension, a WebAssembly implementation without it will become pointless if Emscripten uses it. I already have problems running Emscripten compiled binaries in WAVM due to the coupling of its C runtime to its JavaScript harness. If that interface starts to use GC references, then I will need to add a garbage collector to my standalone WebAssembly VM.

I don't want to add a garbage collector to my WebAssembly VM, not because I don't want to write a garbage collector, but because I want to write a garbage collector within WebAssembly. I'm developing a programming language that will compile to WebAssembly, and for many reasons, I think a language-specific garbage collector will be more effective than a language-independent TypedObject collector, even within the limitations of the WebAssembly abstraction.

GC.md is entirely about exposing the JavaScript object model to WebAssembly programs, with no mention of exposing primitives to WebAssembly that allow efficient implementation of a garbage collector. This worries me as much as the risk that Emscripten(or other toolchains) will make GC references a required feature; I know that folks are aware of, and not opposed to, the desire to implement a GC within WebAssembly, but that's clearly not where resources are focused.

In my opinion that focus is misplaced. The web will benefit the most from new applications that are enabled by WebAssembly performance, not from making existing applications faster. I anticipate new applications using WebAssembly will want to work with their own DOM in linear memory, with some isolated code to replicate that into the browser DOM.

I think the strongest argument for a VM-level GC is that for languages whose data maps well to TypedObjects, a VM-level GC benefits from direct access to the hardware and OS. Perhaps this is a worthwhile optimization, but I think enabling a WebAssembly application to do its own GC should be the primary goal.

managed objects

Source

AndrewScheidecker

👍18

Most helpful comment

Certainly it makes sense to design the GC feature and toolchain such that it is optional and that plain C/C++/Rust programs don't require it. A possible counterexample would be if using GC references allowed for more efficient integration with Web APIs, but that would presumably only be if the Web API was used.

That being said, if what you want is to compile _is_ a GC'd language, there are several advantages to integrating with the host environment's GC over doing it all w/in linear memory:

cycles that go between wasm objects and host environment (e.g., DOM) object could be collected by the host environment GC but not by a GC in linear memory; you'd end up with a strong edge that rooted the otherwise-collectable cycle
host environment GCs are integrated with things like requestAnimationFrame and vsync, allowing them to, e.g., tailor incremental GC slices to end when it's time to start the next frame
less internal fragmentation by allowing safe reuse of GC arenas between what would otherwise be disjoint wasm linear memories; esp. important on 32-bit with limited virtual address space
better browser devtool integration since devtools better understand host GC objects
smaller distributable runtime by reusing what's already in the browser

And I think these justify the work to integrate GC into wasm.

lukewagner on 27 Jul 2016

👍10

All 25 comments

That being said, if what you want is to compile _is_ a GC'd language, there are several advantages to integrating with the host environment's GC over doing it all w/in linear memory:

cycles that go between wasm objects and host environment (e.g., DOM) object could be collected by the host environment GC but not by a GC in linear memory; you'd end up with a strong edge that rooted the otherwise-collectable cycle
host environment GCs are integrated with things like requestAnimationFrame and vsync, allowing them to, e.g., tailor incremental GC slices to end when it's time to start the next frame
less internal fragmentation by allowing safe reuse of GC arenas between what would otherwise be disjoint wasm linear memories; esp. important on 32-bit with limited virtual address space
better browser devtool integration since devtools better understand host GC objects
smaller distributable runtime by reusing what's already in the browser

And I think these justify the work to integrate GC into wasm.

lukewagner on 27 Jul 2016

👍10

For example, a lisp implementation can store a cons in two words, yet using the web browser object model is unlikely to be nearly as memory efficient. There are also a lot of different tradeoffs that can be made in object models and in garbage collectors and I support wasm being free of the constraints of a particular host implementation. For example, if periodic pauses are not a show stopper for some applications, such as a symbolic mathematics application, then they can use more performance efficient garbage collectors. Also if the garbage collector is implemented within a wasm process then it can run independent of a runtime garbage collector and not block the runtime.

ghost on 27 Jul 2016

👍2

Yes, these are reasons why people may still want to implement their own GC using linear memory and adding GC support to wasm should not prevent them from doing so.

lukewagner on 27 Jul 2016

👍1

Yes, it's even possible to implement a garbage collector in wasm right now, but it would be a lot more practical if there was a way to scan GC references in locals and intermediates. That's a very useful feature that is a subset of the functionality needed for host-defined GC, but AFAICT isn't mentioned anywhere in the design repo.

My concern about host-defined GC is not that it would technically prevent more general GC building blocks, but that:

It could easily become a required VM feature through adoption by popular toolchains
And it might be seen as "enough", so efforts to support efficient GC implementations within wasm are blocked

better browser devtool integration since devtools better understand host GC objects

As a secondary benefit, the ability to scan locals and intermediates on the stack would be a big step toward browser-independent, language-specific debug tools.

AndrewScheidecker on 27 Jul 2016

👍3

I agree that some amount of opt-in/pay-as-you-go stack inspection would be a generally useful feature for wasm and we should add something like that before long.

I think you make a good point about the toolchain but I think we can accept this as a high-level goal of avoiding a hard dependency on GC for non-inherently-GC sources. I think that's a good idea even on the web because I think it should be possible to create a wasm-without-GC-only worker that doesn't get the usual GC-heap allocated for it.

lukewagner on 27 Jul 2016

If the local variables and the values stack can be accessed from anywhere in their dynamic scope then might that invalidate some assumptions implicit in the decoding to the SSA form? Perhaps some more thought needs to be given to the options here and perhaps it even impacts the language design.

One option would be for the wasm code producer to simply not keep pointers that need to be scavenged on the values stack or in local variables at points that might trigger a garbage collection, rather in linear memory. So the local variables and values stack could continue to be well constrained to lexical scope.

Might it help if the local variables and values stack could only be read and not written outside their lexical scope? At first thought that might be enough of a restriction to avoid impacting the SSA decoder? This would support a conservative scavenger of values in the local variables and values stack, but not a precise garbage collector that wants to move objects with pointers from the local variables or values stack.

Lets say that the expressionless encoding, which avoided the values stack but still used local variables, were taken to another extreme and did not even use local variables so that every operator reads and writes to the linear memory. It would have similar problems for the decoder as allowing access to the local variables and stack from any dynamic scope, but perhaps even worse problems because the changes to linear memory persist past the end of the function and would need to be written back.

So at first thought the only practical option seems to be to keep pointers that need to be scavenged out of the local variables and off the values stack?

Would be interested in opinions on the impact of allowing only the reading of the local variables and values stack from any dynamic context, and not writing to these, and would take impact the SSA decoders?

Let's say that it is ok to allow reading of any local variable and the values stack from within their dynamic context, then it seems that the get_value proposal (which is a static reference up the stack) should be fine too? The model would then be that the values stack has read-only values that are dynamically scoped, and the local variables have values that can be read in their dynamic scope but only written in their lexical scope.

Do any of these issues impact the discussed wasm-GC variant? Wouldn't it have the some of these issues too, possible frustrating SSA decoding if GC pointers in local variables or the values stack could be written from any dynamic context?

ghost on 28 Jul 2016

Sorry, giving this a little more thought it does not seem practical at all to allow access to the local variables or the values stack outside their lexical scope, as that would appear to require they be backed by memory and written back before all calls which does not look practical for performance.

That would seem to leave producers implementing garbage collection to write back any pointers to linear memory at points that can trigger a garbage collection and scavenge them there.

ghost on 28 Jul 2016

If the local variables and the values stack can be accessed from anywhere in their dynamic scope then might that invalidate some assumptions implicit in the decoding to the SSA form? Perhaps some more thought needs to be given to the options here and perhaps it even impacts the language design.

I think you're observing that there's implementation non-determinism in the lifetime of local variables and intermediate values. Is that right?

I think it's necessary to add something like the LLVM GC statepoint intrinsic: a call that additionally takes a set of value stack locations and local variables that may be mutated by the call. That allows serialization of local state mutations with the call, without pessimistically assuming everything needs to be serialized with the call.

Beyond the local state explicitly serialized through this GC-aware call, it may be useful to allow a stack inspection to observe non-determinism in the form of other live locals and intermediate values.

Might it help if the local variables and values stack could only be read and not written outside their lexical scope? At first thought that might be enough of a restriction to avoid impacting the SSA decoder? This would support a conservative scavenger of values in the local variables and values stack, but not a precise garbage collector that wants to move objects with pointers from the local variables or values stack.

IMO the goal should be precise, compacting GC.

AndrewScheidecker on 28 Jul 2016

It's not immediately obvious what would be gained by allowing the definition of value stack and local variable locations that can be mutated at a call. Practically these would need to be written back to memory and re-loaded and this could be done explicitly by the producer by just storing them to the linear memory which is already supported. Why add complexity if it is not necessary.

ghost on 28 Jul 2016

Sorry, giving this a little more thought it does not seem practical at all to allow access to the local variables or the values stack outside their lexical scope, as that would appear to require they be backed by memory and written back before all calls which does not look practical for performance.

The same information used to unwind the callstack for zero cost exception handling can reconstruct the register state at the point of each call. The same information can be used to find and mutate GC references that were in registers at the point of a call, whether they have been saved to the stack by a callee, or are still in a register when GC is triggered.

AndrewScheidecker on 28 Jul 2016

Thank you, I guess that might work, using meta information to note the live locations and where they are, but it sounds like a lot of complexity for the runtimes. There might still be issues for the SSA decoder if the state can be mutated outside the lexical scope.

ghost on 28 Jul 2016

Thank you, I guess that might work, using meta information to note the live locations and where they are, but it sounds like a lot of complexity for the runtimes.

It's definitely tricky to implement. However, zero cost exception handling is already on the post-MVP roadmap, and a runtime that prefers simplicity can always just spill those GC references to the shadow stack across calls.

There might still be issues for the SSA decoder if the state can be mutated outside the lexical scope.

You can encode the possibility of this mutation in SSA like this:

%gcRef0 = ... ; The initial value of gcRef.
%gcStatepoint = call ... gcstate=[%gcRef0...]
%gcRef1 = gcmutate %gcStatepoint 0 ; The new, possibly mutated value of gcRef after the GC statepoint
%result = gcresult %gcStatepoint   ; The result of the function called by the GC statepoint

AndrewScheidecker on 28 Jul 2016

@lars-t-hansen The VM need only sandbox the memory, so a language implementing it's own object system (and possible pointer tagging) need not always be type safe and need not protect the identity of its pointers. Whereas opaque GC pointers need to always be safe to protect the sandbox and you mention not wanting to expose the pointer value to the code running in the VM. Also memory management can often implement this sandbox at little cost, and hardware might even add features to better support a memory managed sandbox in future. Thus code implementing it's own object heap within the sandbox can offer better performance and little burden to the VM.

For example, consider a list (a linked list of cons cells) in which each element is a tagged small integer (like your tagged integers) and code that adds all the list elements. The producer might take advantage of a declaration that these are small integers and that the sum does not overflow a small integer and simply load and add these in single instruction on the x86. I don't see how a language targeting gc-heap allocated objects could approach this level of performance.

For example, with pointer tagging a pointer to a cons cell can have a tagged type which allows a cons cell to consume only two words of memory. I don't see how a gc-heap allocated object system could approach this level of memory efficiency.

I would like to see symbolic maths software such as maxima written in a lisp running on the web efficiently, it seems a fair use case.

If rust is targeting wasm then perhaps the burden should be on the producer to at least try to implement their object system in wasm rather than just lean on the VM consumer to offer an object system with tagged pointers etc to be a target.

ghost on 5 Jan 2017

To bolster adaptability amid an ever-changing environment, it is important to expose only raw hardware capability and not force people to have a GC.

I believe it's best for the team to spin-off a separate independent group which has only one specific purpose: to build an open-source GC atop Wasm.

..This allows anyone [including browser-makers] to easily have GC capabilities when-if they need it. At the same time, heavy real-time programs that need the GC to step aside can have exactly that.

I like this quote from Alan Kay:

When C++ came out, they tried to cater to C programmers, and they made a system that was neither fish nor fowl.

Do the web right people. Do it right this time or we'll be called amateurs yet again.

Pacerier on 1 Apr 2017

@Pacerier, the availability of GC in Wasm won't affect code that doesn't use it. In particular, the GC'ed heap will be completely separate from the linear memory. Basically, all that's gonna be new is that Wasm code is given access to the existing JavaScript heap in some form. So nobody is forced into anything.

It's gonna be extremely difficult to implement a GC with competitive performance _on top_ of Wasm, because it is the sort of code that is impacted most by its safety restrictions. It also is an incredible amount of work to get correct and efficient. At the same time it is a mechanism that virtually every high-level language needs, so it would be a poor service to the web platform if everybody needed to redo the hard work that has already been done in browsers.

rossberg on 3 Apr 2017

👍3

To be fair, with GC in the platform, non-Web implementations of wasm may be forced into implementing a GC that they otherwise might not have needed. All browser-based wasm implementations will surely share their GC with JS, but it does add an incredible amount of additional work, as you observe, to non-Web implementations.

sunfishcode on 5 Apr 2017

👍2

Fair enough. Conceivably, the specification for a future Wasm with GC could identify a GC-free sublanguage that implementations can choose to restrict to.

rossberg on 5 Apr 2017

Some non-web platforms will have it easier. .NET or Java-based WASM implementations, for instance, also have garbage collection readily available.

I agree with the GC-free sub-language concept, this provides a "feature level" that WASM libraries can target to support a broader array of platforms while still giving useful capability to sophisticated ones like web browsers. This approach can be applied to various other proposals, too.

RyanLamansky on 5 Apr 2017

I'm not opposed to an implementation-defined GC extension. I'd probably even support it in WAVM, though not with "competitive performance". I'm satisfied with the high-level goal Luke proposed above to "avoiding a hard dependency on GC for non-inherently-GC sources".

What I'm worried about is that there will be no standard extension to allow a WASM program to garbage collect linear memory efficiently, and that browser vendors might not support such an extension even if they for whatever reason allowed it to be standardized.

I don't think it's productive to argue over whether a language should or should not implement their own garbage collector on top of WASM. The first goal of WASM is to make a "compilation target which can be compiled to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms," so I think that it should be a given that WASM should try to expose the common hardware capabilities available to native code on a wide range of platforms: to inspect and mutate the stack and register state.

I don't think anybody disagrees with that in principle, but everybody has limited bandwidth to work on stuff. I would like to make a more concrete proposal for how this would work, but before I put in the effort, I'd like to know that it has a chance of being standardized and implemented by the browser vendors, and won't just be punted as being too low priority.

AndrewScheidecker on 6 Apr 2017

@sunfishcode I've always had the model in the my head that WASM managed data will be like SIMD in WASM: a fairly well-contained extension that defines new values, types, and opcodes. If we properly modularize our test suite and specification documents, then it should be possible to clearly delineate what is "core WASM" and what is a standardized extension.

If we succeed at the above, then it makes total sense that some implementations of WASM will simply choose not to implement the managed data extension of WASM.

As for GC on top of WASM, since this is a layer above the execution engine, I see no reason why it cannot build on top of WASM without any explicit support. In fact, projects to do exactly that are already under way. I do not view a standardized GC-on-top-of-WASM as a goal for the core WASM project, but view it in the same category as tool conventions.

titzer on 13 Apr 2017

Agreed on keeping GC an optional component so that wasm engines could choose not to implement the GC feature and still be fully functional for a set of languages that don't require GC. This is now a bullet in the high-level approach in the GC PR.

lukewagner on 13 Apr 2017

As for GC on top of WASM, since this is a layer above the execution engine, I see no reason why it cannot build on top of WASM without any explicit support. In fact, projects to do exactly that are already under way. I do not view a standardized GC-on-top-of-WASM as a goal for the core WASM project, but view it in the same category as tool conventions.

It's possible, but not very efficient, to implement a linear memory garbage collector in WASM; WASM lacks a way to find/mutate references on the call stack or in registers, so you would have to spill all references to linear memory across calls.

What I want is not to standardize some linear memory garbage collector, but to standardize some lower level extension that provides a safe and portable way to find, read, and write addresses on the stack. It would need extended call operators that take some additional operands, and remap them through any "stack map" operation that may occur during the call. It would also need operators for reading and writing references that have been "mapped" by those extended call operators on the current thread's stack. On its own, that's not enough to match performance of a native GC implementation, but it would close the gap quite a bit.

AndrewScheidecker on 13 Apr 2017

👍4

@AndrewScheidecker Yeah, I think something like that is a reasonable feature to add and could also be generally useful; recently we were talking about how one could implement Windows SEH filters and iirc they needed this as well.

lukewagner on 14 Apr 2017

@rossberg-chromium, re:

the availability of GC in Wasm won't affect code that doesn't use it.

..That it's optional, means it needs to be removed. Loose coupling. What hardware can and cannot do is set out very clearly. Thus since there are no interacting concerns, a layered approach is superior.

@rossberg-chromium, re:

So nobody is forced into anything.

..You're still missing the point. The point is: Loose coupling. Since there are no interacting concerns, a layered approach is superior.

@rossberg-chromium, re:

It's gonna be extremely difficult to implement a GC with competitive performance on top of Wasm, because it is the sort of code that is impacted most by its safety restrictions. It also is an incredible amount of work to get correct and efficient. At the same time it is a mechanism that virtually every high-level language needs, so it would be a poor service to the web platform if everybody needed to redo the hard work that has already been done in browsers.

..It will be done "the opensource way": like how jquery is just one library yet used by millions of teams. No reason why small code shops need to write jquery themselves when big companies have dedicated teams working on perfecting that open-source project.

@rossberg-chromium, re:

At the same time it is a mechanism that virtually every high-level language needs.

..No way. What about C?

..Indeed, game devs and devs of other time-sensitive apps who don't happen to have the luxury of using C always end up hating the GC they had had no choice but to deal with.

@AndrewScheidecker, re:

What I'm worried about is that there will be no standard extension to allow a WASM program to garbage collect linear memory efficiently

..That's not even possible because as long as all of the hardware ability is exposed to the programmer, everything can be done. Indeed, if it happens to be not, it simply means that the exposed layer is too high-level and a smaller subset closer to hardware is the rightful layer to target.

I suppose this standard is expecting itself to be entrenched within the web for at least 2 or 3 decades? Then target the raw hardware layer and stick to just that. Everything else that is above the hardware layer should be handled by "the other teams". One team one concern.

At the end of the day, it pretty much boils down to a simple question: Is there a hardware in this century that has an instruction for garbage-collect? A resounding "Nope". When one has been created, and only then, we can extend the standard by adding the GC in core.

Pacerier on 28 Apr 2017

❤1

Even if it is added as an optional extension, a WebAssembly implementation without it will become pointless if Emscripten uses it. I already have problems running Emscripten compiled binaries in WAVM due to the coupling of its C runtime to its JavaScript harness. If that interface starts to use GC references, then I will need to add a garbage collector to my standalone WebAssembly VM.

^ This x1000

This is exactly how Embrace, Extend, and Extinguish works. I'm not saying the GC proposal is intentionally patterned after Microsoft, I'm just pointing out the inevitable effect that comes from unwittingly following their pattern:

Currently all WASM can be hosted _everywhere_
We add a GC option for WASM programs ("but it's _only_ an extension!")
Runtimes without a GC now face a choice: implement a GC or be unable to host all WASM
End result: WASM can no longer be hosted everywhere

Do we really want WASM to no longer be viable in places?

I understand the practical side: the largest market for WASM right now is the web. I mean it's called _Web_ Assembly.

But everyone please remember it is not exclusive to internet browsers. Right now I can embed an arbitrary untrusted WASM binary in a Rust program without compromising security. And anyone can produce that WASM with LLVM. That is awesome. I'd hate to make that more difficult. And who knows what the web will look like in 20 years. And let's not forget what the co-founder of Docker said about WASM+WASI:

A standardized system interface was the missing link

He didn't say "an extension to the language was the missing link".

If we want WASM programs to have high-quality GCs, then let's encapsulate the GCs and let the WASM programs run them themselves (just like @Pacerier's jQuery example).

And if we want tighter integration with host GCs then create a normal interface for it.

That being said, if what you want is to compile _is_ a GC'd language, there are several advantages to integrating with the host environment's GC over doing it all w/in linear memory:

cycles that go between wasm objects and host environment (e.g., DOM) object could be collected by the host environment GC but not by a GC in linear memory; you'd end up with a strong edge that rooted the otherwise-collectable cycle

I believe this would be a concrete example of such a cycle: the host and WASM program work together to create a circular linked list. One node lives in the host, the other node lives in the WASM program, and each node points forward to the other node. Neither side can GC its node without corrupting the other's reference.

But to me this signals an issue with that implementation of references.

A reference in a GC'd language is supposed to be trackable by the GC no matter where the reference goes. You can't give a C# object reference to a C function because the C# GC can't reach into C land and muck around. You can extract a memory address from that reference and pass that number off but that doesn't guarantee the memory address will be meaningful later unless you first tell C#'s GC to pin that object. Once you do that then the C function can do meaningful things with that number in a consistent way. Both sides are happy.

Does the fact that C# has a GC mean that it requires every other program it p/invokes to have a GC as well? No. Instead there are explicit controls in C# for pinning/rooting and that lets you orchestrate programs in a consistent manner.

A reference in Rust has guarantees that often cannot be maintained outside of Rust. Ownership, lifetime, and mutability all can be broken once you extract a memory address and toss that number over the wall.

Does the fact that Rust has an ownership system mean it requires every foreign function to have an ownership system? No. Instead API designers are very careful when they build a framework, and Rust wrappers for foreign functions take a lot of thought.

If a reference needs to be tracked by the host's GC then why not instead encapsulate that concept using WASM's existing constructs?

If the host wants to share a reference-counting pointer with the WASM program then make a reference-counting pointer in shared memory and make sure both sides treat it as one.

If the host wants to shuffle memory around under the feet of the WASM program then make an interface (through normal means) that communicates that intention and concept.

Also, it's interesting that it's usually not a fruitful exercise to wonder how to make reference graphs in a C# backend more cooperative with reference graphs in a Flutter UI frontend. There's a process boundary that people tolerate (even need), and there are frameworks and patterns for crossing that boundary.

host environment GCs are integrated with things like requestAnimationFrame and vsync, allowing them to, e.g., tailor incremental GC slices to end when it's time to start the next frame

I think this can be solved for WASM-hosted GCs through normal means like function calls and shared state.

Not to mention this is making the (large) assumption that the host GC is going to do exactly what the WASM programmer wants.

less internal fragmentation by allowing safe reuse of GC arenas between what would otherwise be disjoint wasm linear memories; esp. important on 32-bit with limited virtual address space

I think this is out of scope, similar to how it's out of scope for any other assembly language. If you want the host to be aware of the program's memory usage patterns then there needs to be a layer composed with WASM that communicates such things to the host. Build a WASM program with an exported function that yields a memory usage map or something.

better browser devtool integration since devtools better understand host GC objects

I don't think this is a very powerful argument. Tooling necessarily lags behind languages, and also tooling isn't static. So I'm not a fan of altering language design to suit the tooling of the moment. Devtools could be made to understand anything.