Design: Proposal: Add inter-language type bindings

Created on 1 May 2019  ·  61Comments  ·  Source: WebAssembly/design

WebAssembly is currently very good at executing code written in arbitrary languages from a given (usually JS) interpreter, but it lacks several key features when it comes to combining multiple arbitrary languages together.

One of these features is a language-agnostic type system. I would like to propose that one or several such system(s) be added to WebAssembly.

As an aside, in previous feature discussions, some contributors have expressed that language-interoperability shouldn't be a design goal of WebAssembly. While I agree that it shouldn't necessarily be a high-priority goal, I think it is a goal striving for in the long term. So before I go into design goals, I'm going to lay out the reasons why I think language interoperability is worth the effort.

Why care about language interoperability?

The benefits of lower language-to-language barriers include:

  • More libraries for wasm users: This goes without saying, but improving language interoperability means that users can use existing libraries more often, even if the library is written in a different language than they're using.

  • Easier adoption of small languages: In the current marketplace, it's often difficult for languages without corporate support to get traction. New languages (and even languages like D with years of refinement) have to compete with languages with large ecosystems, and suffer from their own lack of libraries. Language interoperability would allow them to use existing ecosystems like Python's or Java's.

  • Better language-agnostic toolchains: Right now, most-languages have their own library loading scheme and package manager (or, in the case C/C++, several non-official ones). Writing a language-agnostic project builder is hard, because these languages often have subtle dependencies, and ABI incompatibilities, that require a monolithic project-wide solution to resolve. A robut inter-language type system would make it easier for projects to be split into smaller modules, that can be handled by a npm-like solution.

Overall, I think the first point is the most important, by a wide margin. A better type system means better access to other languages, which means more opportunities to reuse code instead of writing it from scratch. I can't overstate how important that is.

Requirements

With that in mind, I want to outline the requirements an inter-language type system would need to pass.

I'm writing under the assumptions that the type system would be strictly used to annotate functions passed between modules, and would not check how languages use their own linear or managed memory in any way.

To be truly useful in a wasm setting, such a type system would need:

1 - Safety

  • Type-safe: The callee must only have access to data specified by the caller, object-capabilities-style.

    • Memory should be "forgotten" at the end of a call. A callee shouldn't be able to gain access to a caller's data, return, and then access that data again in any form.

2 - Overhead

  • Developers should be comfortable making inter-module calls regularly, eg, in a render loop.

    • Zero-copy: The type system should be expressive enough to allow interpreters to implement zero-copy strategies if they want to, and expressive enough for these implementers to know when zero-copy is optimal.

3 - Struct graphs

  • The type system should include structures, optional pointers, variable-length arrays, slices, etc.

    • Ideally, the caller should be able to send an object graph scattered in memory while respecting requirements 1 and 2.

4 - Reference types

  • Modules should be able to exchange reference types nested deep within structure graphs.

5 - Bridge between memory layouts

  • This is a very important point. Different categories of languages have different requirements. Languages relying on linear memory would want to pass slices of memory, whereas languages relying on GC would want to pass GC references.

    • An ideal type system should express semantic types, and let languages decide how to interpret them in memory. While passing data between languages with incompatible memory layouts will always incur some overhead, passing data between similar languages should ideally be cheap (eg, embedders should avoid serialization-deserialization steps if a memcpy can do the same job).

    • Additional bindings may also allow for caching and other optimization strategies.

    • The conversion work when passing data between two modules should be transparent to the developer, as long as the semantic types are compatible.

6 - Compile-time error handling

  • Any error related to invalid function call arguments should be detectable and expressible at compile-time, unlike in, eg, JS, where TypeErrors are thrown at runtime when trying to evaluate the argument.
  • Ideally, language compilers themselves should detect type errors when importing wasm modules, and output expressive, idiomatic errors to the user. What form this error-checking should take would need to be detailed in the tool-conventions repository.
  • This means that an IDL with existing converters to other languages would be a plus.

7 - Provide a Schelling point for inter-language interaction

  • This is easier said than done, but I think wasm should send a signal to all compiler writers, that the standard way to interoperate between languages is X. For obvious reasons, having mutliple competing standards for language interoperability isn't desirable.

Proposed implementation

What I propose is for bindings to the Cap'n'Proto IDL by @kentonv to be added to Webassembly.

They would work in a similar fashion to WebIDL bindings: wasm modules would export functions, and use special instructions to bind them to typed signatures; other modules would import these signatures, and bind them to their own functions.

The following pseudo-syntax is meant to give an idea of what these bindings would look like; it's approximative and heavily inspired by the WebIDL proposal, and focuses more on the technical challenges than on providing exhaustive lists of instructions.

Capnproto binding instructions would all be stored in a new Cap'n'proto bindings section.

Cap'n'proto types

The standard would need an internal representation of capnproto's schema language. As an example, the following Capnproto type:

```Cap'n Proto
struct Person {
name @0 :Text;
birthdate @3 :Date;

email @1 :Text;
phones @2 :List(PhoneNumber);

struct PhoneNumber {
number @0 :Text;
type @1 :Type;

enum Type {
  mobile @0;
  home @1;
  work @2;
}

}
}

struct Date {
year @0 :Int16;
month @1 :UInt8;
day @2 :UInt8;
}

might be represented as

```wasm
(@capnproto type $Date (struct
    (field "year" Int16)
    (field "month" UInt8)
    (field "day" UInt8)
))
(@capnproto type $Person_PhoneNumber_Type (enum 0 1 2))
(@capnproto type $Person_PhoneNumber (struct
    (field "number" Text)
    (field "type" $Person_PhoneNumber_Type)
))
(@capnproto type $Person (struct
    (field "name" Text)
    (field "email" Text)
    (field "phones" (generic List $Person_PhoneNumber))
    (field "birthdate" $Data)
))

Serializing from linear memory

Capnproto messages pass two types of data: segments (raw bytes), and capabilities.

These roughly map to WebAssembly's linear memory and tables. As such, the simplest possbile way for webassembly to create capnproto messages would be to pass an offset and length to linear memory for segments, and an offset and length to a table for capabilities.

(A better approach could be devised for capabilities, to avoid runtime type checks.)

Note that the actual serialization computations would take place in the glue code, if at all (see Generating the glue code).

Binding operators

| Operator | Immediates | Children | Description |
| :--- | :--- | :--- | :--- |
| segment | off‑idx
len‑idx | | Takes the off-idx'th and len-idx'th wasm values of the source tuple, which must both be i32s, as the offset and length of a slice of linear memory in which a segment is stored. |
| captable | off‑idx
len‑idx | | Takes the off-idx'th and len-idx'th wasm values of the source tuple, which must both be i32s, as the offset and length of a slice of table in which the capability table is stored. |
| message | capnproto-type
capability-table | segments | Creates a capnproto message with the format capnproto-type, using the provided capability table and segments. |

Serializing from managed memory

It's difficult to pin down specific behavior before the GC proposal lands. But the general implementation is that capnproto bindings would use a single conversion operator to get capnproto types from GC types.

The conversion rules for low-level types would be fairly straightforward: i8 converts to Int8, UInt8 and bool, i16 converts to Int16, etc. High-level types would convert to their capnproto equivalents: structure and array references convert to pointers, opaque references convert to capabilities.

A more complete proposal would need to define a strategy for enum and unions.

Binding operators

| Operator | Immediates | Children | Description |
| :--- | :--- | :--- | :--- |
| as | capnproto-type
idx | | Takes the idx'th wasm value of the source tuple, which must be a reference, and produces a capnproto value of capnproto-type. |

Deserializing to linear memory

Deserializing to linear memory is mostly similar to serializing from it, with one added caveat: the wasm code often doesn't know in advance how much memory the capnproto type will take, and need to provide the host with some sort of dynamic memory management method.

In the WebIDL bindings proposal, the proposed solution is to pass allocator callbacks to the host function. For capnproto bindings, this method would be insufficient, because dynamic allocations need to happen both on the caller side and the callee side.

Another solution would be to allow incoming binding maps to bind to two incoming binding expressions (and thus two functions): one that allocates the memory for the capnproto data, and one that actually takes the data.

Deserializing to managed memory

Deserializing to managed memory would use the same kind of conversion operator as the opposed direction.

Generating the glue code

When linking two wasm modules together (whether statically or dynamically), the embedder should list all capnproto types common to both modules, bindings between function types and and capnproto types, and generate glue code between every different pair of function types.

The glue code would depend on the types of the bound data. Glue code between linear memory bindings would boil down to memcpy calls. Glue code between managed memory bindings would boil down to passing references. On the other hand, glue code between linear and managed memory would involve more complicated nested conversion operations.

For instance, a Java module could export a function, taking the arguments as GC types, and bind that function to a typed signature; the interpreter should allow a Python module and a C++ to import that type signature; the C++ binding would pass data from linear memory, whereas the Python binding would pass data from GC memory. The necessary conversions would be transparent to the Java, Python and C++ compilers.

Alternate solutions

In this section, I'll examine alternate ways to exchange data, and how they rate on the metrics defined in the Requirements section.

Exchange JSON messages

It's the brute-force solution. I'm not going to spend to much time on that one, because its flaws are fairly obvious. It fails to meet requirements 2, 4 and 6.

Send raw bytes encoded in a serialization format

It's a partial solution. Define a way for wasm modules to pass slices of linear memory and tables to other modules, and module writers can then use a serialization format (capnproto, protobuff or some other) to encode a structured graph into a sequence of bytes, pass the bytes, and use the same format to decode it.

It passes 1 and 3, and it can pass 2 and 4 with some tweaking (eg pass the references as indices to a table). It can pass 6 if the user makes sure to export the serialization type to a type definition in the caller's language.

However, it fails at requirements 5 and 7. It's impractical when binding between two GC implementations; for instance, a Python module calling a Java library with through Protobuf would need to serialize a dictionary as linear memory, pass that slice of memory, and then deserialize it as a Java object, instead making a few hashtable lookups that can be optimized away in a JIT implementation.

And it encourages each library writer to use their own serialization format (JSON, Protobuf, FlatBuffer, Cap'n Proto, SBE), which isn't ideal for interoperability; although that could be alleviated by defining a canonical serialization format in tool-conventions.

However, adding the possibility to pass arbitrary slices of linear memory would be a good first step.

Send GC objects

It would be possible to rely on modules sending each other GC objects.

The solution has some advantages: the GC proposal is already underway; it passes 1, 3, 4 and 7. GC-collected data is expensive to allocate, but cheap to pass around.

However, that solution is not ideal for C-like languages. For instance, a D module passing data to a Rust module would need to serialize its data into a GC graph, pass the graph to the Rust function, which would deserialize it into its linear memory. This process allocates GC nodes which are immediately discarded, for a lot of unnecessary overhead.

That aside, the current GC proposal has no built-in support for enums and unions; and error handling would either be at link time or run time instead of compile time, unless the compiler can read and understand wasm GC types.

Use other encodings

Any serialization library that defines a type system could work for wasm.

Capnproto seems most appropriate, because of its emphasis on zero-copy, and its built-in object capabilities which map neatly to reference types.

Remaining work

The following concepts would need to be fleshed out to turn this bare-bones proposal into a document that can be submitted to the Community Group.

  • Binding operators
  • GC type equivalences
  • Object capabilities
  • Bool arrays
  • Arrays
  • Constants
  • Generics
  • Type evolution
  • Add a third "getters and setters" binding type.
  • Possible caching strategies
  • Support for multiple tables and linear memories

In the meantime, any feedback on what I've already written would be welcome. The scope here is pretty vast, so I'd appreciate help narrowing down what questions this proposal needs to answer.

Most helpful comment

we can add a few bindings per IR type to cover the vast majority of languages.

This is the crucial underlying assumption which I believe is simply not true. My experience is that there are (at least!) as many representation choices as there are language implementations. And they can be arbitrarily complicated.

Take V8, which alone has a few dozen(!) representations for strings, including different encodings, heterogeneous ropes, etc.

The Haskell case is far more complicated than you describe, because lists in Haskell are lazy, which means that for every single character in a string you might need to invoke a thunk.

Other languages use funny representations for the length of a string, or don't store it explicitly but require it to be computed.

These two examples already show that a declarative data layout doesn't cut it, you'd often need to be able to invoke runtime code, which in turn might have its own calling conventions.

And that's just strings, which are a fairly simple datatype conceptually. I don't even want to think about the infinite number of ways in which languages represent product types (tuples/structs/objects).

And then there is the receiving side, where you'd have to be able to create all these data structures!

So I think it is entirely unrealistic that we would ever get even remotely close to supporting the "vast majority of languages". Instead, we would start to privilege a few, while already growing a large zoo of arbitrary stuff. That seems fatal on multiple levels.

All 61 comments

This is really interesting! I've only read through quickly, and just have some initial thoughts, but my first and foremost question would be to ask why the existing FFI mechanism that most languages already provide/use are not sufficient for WebAssembly. Virtually every language I'm familiar with has some form of C FFI, and thus are already capable of interoperating today. Many of those languages are able to do static type checking based on those bindings as well. Furthermore, there is already a great deal of tooling around these interfaces (for example, the bindgen crate for Rust, erl_nif for Erlang/BEAM, etc.). C FFI already addresses the most important requirements, and has the key benefit of already being proven and used in practice widely.

5 - Bridge between memory layouts

An ideal type system should express semantic types, and let languages decide how to interpret them in memory. While passing data between languages with incompatible memory layouts will always incur some overhead, passing data between similar languages should ideally be cheap (eg, embedders should avoid serialization-deserialization steps if a memcpy can do the same job).

The conversion work when passing data between two modules should be transparent to the developer, as long as the semantic types are compatible.

The transparent translation of one layout to another when passing data across the FFI barrier really seems like a job for compiler backends or language runtimes to me, and likely not desirable at all in performance-sensitive languages like C/C++/Rust/etc. In particular, for things you plan on passing back and forth across FFI, it would seem to me to always be preferable to use a common ABI, rather than do any kind of translation, as the translation would likely incur too high of a cost. The benefit of choosing a layout other than the common ABI of the platform is unlikely to be worth it, but I'll readily admit I may be misunderstanding what you mean by alternative layouts.

As an aside, putting the burden of solid FFI tooling on compilers/runtimes has an additional benefit, in that any improvements made are applicable on other platforms, and vice versa, as improvements to FFI for non-Wasm platforms benefit Wasm. I think the argument has to be really compelling to essentially start from square one and build a new FFI mechanism.

Apologies if I've misunderstood the purpose of the proposal, or missed something critical, as I mentioned above, I need to read through again more carefully, but felt like I needed to raise my initial questions while I had some time.

Apache Arrow exists for this too, but is more focused on high performance applications.

I think I agree with the general motivation here and it basically lines up with discussions we've had with how Web IDL Bindings could be generalized in the future. Indeed, earlier drafts of the explainer contained an FAQ entry mentioning this inter-language use case.

My main concern (and reason for omitting that FAQ entry) is scope: the general problem of binding N languages seems likely to generate a lot of open-ended (and possibly non-terminating) discussion, especially given that noone is doing it already (which of course is a chicken-and-egg problem). In contrast, the problems addressed by Web IDL Bindings are fairly concrete and readily demonstrated with Rust/C++ today, allowing us to motivate the (non-trivial) effort to standardize/implement and also eagerly prototype/validate the proposed solution.

But my hope is that Web IDL Bindings allows us to break this chicken-and-egg problem and start getting some experience with inter-language binding that could motivate a next wave of extension or something new and not Web IDL specific. (Note that, as currently proposed, if two wasm modules using compatible Web IDL Bindings call each other, an optimizing impl can do the optimizations you're mentioning here; just without the full expressivity of Cap'n Proto.)

I should state up front that I have not yet had the time to fully grok the proposal.
The reason for that is that I believe the task to be impossible. There are two fundamental reasons for this:
a. Different languages have different semantics that are not necessarily captured in a type annotation. Case in point, Prolog's evaluation is radically different to C++'s evaluation: to the point where the languages are essentially not interoperable. (For Prolog you can substitute a number of other languages)

b. By definition, any LCD of type systems is not guaranteed to capture all of a given language's type language. That leaves the language implementer with a deeply uncomfortable choice: support their own language or forgo the benefits of their language's type system. Case in point: Haskell has 'type classes'. Any implementation of Haskell which involved not supporting type classes would effectively gut it and make it unusable.
Another example: C++'s support of generics requires compile-time elimination of the genericity; on the other hand, ML, Java (and a bunch of other languages) use a form of universal representation -- that is not compatible with the approach taken by C++.

On the other hand having two expressions of an exported/imported type seems to bring up its own issues: is the language system supposed to verify that the two expressions are consistent in some sense? Whose responsibility is it to do this work?

@lukewagner Thanks for the links! Definitely glad I got a chance to read that document!

It seems to me like there are two things kinda blended together in this particular discussion - some of what is below is written out so I can have my understanding double checked, so feel free to point out anything I may have misunderstood or missed:

  1. Efficient host bindings

    • Basically the problem WebIDL is intended to solve, at least for browser environments - an interface description that maps from module->host and host->module, essentially delegating the work of translating from one to the other to the host engine. This translation isn't necessarily guaranteed to be ideal, or even optimized at all, but optimizing engines can make use of it to do so. However, even optimized, translation is still performed to some degree, but this is acceptable because the alternative is still translation, just slower.

  2. Efficient heterogenous module-to-module bindings.

    • In other words, given two modules, one written in source and the other in dest, sharing types between them, calling from source->dest and/or dest->source

    • If no common FFI is available, and given something like WebIDL, i.e. piggy-backing on 1, the unoptimized path would be to translate through some common denominator type provided by the host environment when calling across language barriers, e.g. source type -> common type -> dest type.



      • An optimizing engine could theoretically make this translation direct from source to dest without the go-between, but still imposes translation overhead.



    • If a common FFI is available, i.e. source and dest share an ABI (e.g. C ABI), then source and dest can call each other directly with no overhead at all, via the FFI. This is probably the most likely scenario in practice.

So my take is that there are definitely benefits to leveraging WebIDL, or something like it (i.e. a superset that supports a broader set of host APIs/environments), but it is really only a solution to the problem outlined in 1, and the subset of 2 which deals with inter-language bindings where no FFI is available. The subset of 2 where FFI _is_ available, is clearly preferable to the alternatives, since it incurs no overhead per se.

Are there good reasons for using an IDL even when FFI is an option? To be clear, I definitely agree with using an IDL for the other use cases mentioned, but I'm specifically asking in the context of language interoperability, not host bindings.

A couple additional questions I have, if both C FFI (as an example, since it is most common) and IDL are used/present at the same time:

  • If both source and dest languages provide different type definitions for a shared type with the same underlying in-memory representation according to their common ABI (for example, a common representation for a variable-length array) - will the host engine try to perform a translation between those types just because the IDL directives are present, even though they could safely call each other using their standard FFI?

    • If not, and it is opt-in, that seems like the ideal scenario, since you can add IDL to support interop with languages without FFI, while supporting languages with FFI at the same time. I'm not sure how a host engine would make that work though. I haven't thought it through completely, so I'm probably missing something

    • If so, how does the host engine unify types?:



      • If the engine only cares about layout, then how can static analysis detect when a caller provides incorrect argument types to a callee? If that kind of analysis is not a goal, then it would seem that IDL is really only ideally suited for host bindings, and less so cross-language.


      • If the engine cares about more than layout, in other words the type system requires both nominal and structural compatibility:





        • Who defines the authoritative type for some function? How do I even reference the authoritative type from some language? For example, let's say I'm calling a shared library written in another language which defines an add/2 function, and add/2 expects two arguments of some type size_t. My language doesn't necessarily know about size_t nominally, it has its own ABI-compatible representation of machine-width unsigned integers, usize, so the FFI bindings for that function in my language use my languages types. Given that, how can my compiler know to generate IDL that maps usize to size_t.






  • Are there examples of IDL interfaces used to call between modules in a program, where FFI is available but explicitly left unused in favor of the IDL-described interface? Specifically something not WebAssembly, mostly interested to study the benefits in those cases.

I'll admit I'm still trying to dig through the full details of WebIDL and its predecessors, how all this fits in with the different hosts (browser vs non-browser) and so on, definitely let me know if I've overlooked something.

@bitwalker

This is really interesting!

Glad you liked it!

but my first and foremost question would be to ask why the existing FFI mechanism that most languages already provide/use are not sufficient for WebAssembly.

The C type system has a few problems as an inter-language IDL:

  • It operates under the assumption of a shared address space, which unsafe and deliberately doesn't hold in WebAssembly. (my own experience with a JS-to-C FFI suggests that implementations tend to just trade safety for speed)

  • It doesn't have native support for dynamic length arrays, tagged unions, default values, generics, etc.

  • There isn't a direct equivalent to reference types.

C++ solves some of these problems (not the biggest one, shared address space), but adds a bunch of concepts that aren't really useful in IPC. Of course, you could always use a superset of C or a subset of C++ as your IDL and then devise binding rules around it, but at that point you're getting almost no benefits from existing code, so you may as well use an existing IDL.

In particular, for things you plan on passing back and forth across FFI

I'm not quite how you mean that, but to be clear: I don't think passing mutable data back and forth between modules is possible in the general case. This proposal tries to outline a way to send immutable data and get immutable data in return, between modules that don't have any information on how the other stores its data.

The benefit of choosing a layout other than the common ABI of the platform is unlikely to be worth it, but I'll readily admit I may be misunderstanding what you mean by alternative layouts.

The thing is, right now, the common ABI is a slice of bytes stored in linear memory. But in the future, when the GC proposal is implemented, some languages (Java, C#, Python) will store very little to nothing in linear memory. Instead, they will store all their data in GC structures. If two of these languages try to communicate, serializing these structures to a stream of bytes only to immediately deserialize them would be unnecessary overhead.


@KronicDeth Thanks, I'll look into it.

Although, from skimming the doc, this seems to be a superset of Flatbuffers, specifically intended to improve performance? Either way, what are its qualities that can uniquely help WebAssembly module interoperability, compared to Flatbuffers or Capnproto?


@lukewagner

But my hope is that Web IDL Bindings allows us to break this chicken-and-egg problem and start getting some experience with inter-language binding that could motivate a next wave of extension or something new and not Web IDL specific.

Agreed. My assumption when writing this proposal was that any capnproto bindings implementation would be based on feedback from implementing the WebIDL proposal.

My main concern (and reason for omitting that FAQ entry) is scope: the general problem of binding N languages seems likely to generate a lot of open-ended (and possibly non-terminating) discussion, especially given that noone is doing it already (which of course is a chicken-and-egg problem).

I think discussing a capnproto implementation does have value, though, even this early.

In particular, I tried to outline what requirements the implementation should/could try to fulfill. I think it would also be useful to list common use cases that an inter-language type system might try to address.

Regarding the N-to-N problem, I'm focusing on these solutions:

  • Only worry about RPC-style data transfer. Don't try to pass shared mutable data, classes, pointer lifetimes, or any other type on information more complicated than "a vector has three fields: 'x', 'y', and 'z', which are all floats".

  • Try to group languages and use cases into "clusters" of data-handling strategies. Establish strategies at the center of these clusters; language compilers bind to a given strategy, and the interpreter does the rest of the NxN work.


@fgmccabe

The reason for that is that I believe the task to be impossible. There are two fundamental reasons for this:
a. Different languages have different semantics that are not necessarily captured in a type annotation. Case in point, Prolog's evaluation is radically different to C++'s evaluation: to the point where the languages are essentially not interoperable. (For Prolog you can substitute a number of other languages)

Any implementation of Haskell which involved not supporting type classes would effectively gut it and make it unusable.

Yeah, the idea isn't to define a perfect "easily compatible with all languages" abstraction.

That said, I think most languages have some similarities in how they structure their data (eg, they have a way to say a "every person has a name, an email, and an age", or "every group has a list of people of arbitrary size").

I think it's possible to tap into these similarities to significantly reduce friction between modules. (see also my answer to lukewagner)

b. By definition, any LCD of type systems is not guaranteed to capture all of a given language's type language. That leaves the language implementer with a deeply uncomfortable choice: support their own language or forgo the benefits of their language's type system.

Yeah. I think the rule of thumb here is "If it's a shared library boundary, make it a capnproto type, otherwise, use your native types".

On the other hand having two expressions of an exported/imported type seems to bring up its own issues: is the language system supposed to verify that the two expressions are consistent in some sense? Whose responsibility is it to do this work?

Yeah, I initially wanted to include a section about invariant-checking, and another about type compatibility, but I lost courage.

The answer to "whose responsibility is it" is usually "the callee" (because they must assume any data they receive is suspect), but the checks could be elided if the interpreter can prove that the caller respects the type invariants.

The C type system has a few problems as an inter-language IDL

Just to be clear, I'm not suggesting it as an IDL. Rather I'm suggesting that the binary interface (the C ABI) already exists, is well-defined, and has extensive language support already. The implication then is that WebAssembly doesn't need to provide another solution unless the problem being solved goes beyond cross-language interop.

It operates under the assumption of a shared address space, which unsafe and deliberately doesn't hold in WebAssembly.

So I think I see part of the misunderstanding here. There are two classes of FFI that we're talking about here, one which involves sharing linear memory (more traditional shared memory FFI), and one which does not (more traditional IPC/RPC). I've been talking about the former, and I think you are more focused on the latter.

Sharing memory between modules when you are in control of them (such as the case where you are linking together multiple independent modules as part of an overall application) is desirable for efficiency, but does sacrifice security. On the other hand, it is possible to share a designated linear memory specifically for FFI, though I don't know how practical that is with the default tooling out there today.

Cross-module interop that _doesn't_ use shared memory FFI, i.e. IPC/RPC, definitely seems like a good match for WebIDL, capnproto or one of the other suggestions in that vein, since that is their bread-and-butter.

The part I'm not sure about then is how to blend the two categories in such a way that you don't sacrifice the benefits of either, since the choice to go one way or the other is heavily dependent on use case. At least as stated it seems we could only have one or the other, if it is possible to support both, I think that would be ideal.

It doesn't have native support for dynamic length arrays, tagged unions, default values, generics, etc.

I think this probably isn't relevant now that I realize we were talking about two different things, but just for posterity: The ABI certainly has a _representation_ for variable-length arrays and tagged unions, but you are right in that C does have a weak type system, but that's not really the point, languages aren't targeting C FFI for the C type system. The reason why the C ABI is useful is that it provides a common denominator that languages are able to use to communicate with others that may have no concept of the type system they are interacting with. The lack of higher-level type system features is not ideal, and limits the kind of things you can express via FFI, but the limitations are also part of why it is so successful at what it does, pretty much any language can find a way to represent the things exposed to it via that interface, and vice versa.

C++ solves some of these problems (not the biggest one, shared address space), but adds a bunch of concepts that aren't really useful in IPC. Of course, you could always use a superset of C or a subset of C++ as your IDL and then devise binding rules around it, but at that point you're getting almost no benefits from existing code, so you may as well use an existing IDL.

Agreed, for IPC/RPC, C is a terrible language for defining interfaces.

The thing is, right now, the common ABI is a slice of bytes stored in linear memory.

That's certainly the primtive we're working with, but the C ABI defines a lot on top of that.

But in the future, when the GC proposal is implemented, some languages (Java, C#, Python) will store very little to nothing in linear memory. Instead, they will store all their data in GC structures. If two of these languages try to communicate, serializing these structures to a stream of bytes only to immediately deserialize them would be unnecessary overhead.

I'm not convinced that those languages will jump on defering GC to the host, but that's just speculation on my part. In any case, languages that understand the host GC managed structures could just decide on a common representation for those structures using the C ABI just as easily as they could be represented using capnproto, the only difference is where the specification of that representation lives. That said, I have only a very tenuous grasp of the details of the GC proposal and how that ties in to the host bindings proposal, so if I'm way off the mark here, feel free to disregard.

TL;DR: I think we agree with regard to module interop where shared linear memory is not in play. But I think shared memory _is_ important to support, and the C ABI is the sanest choice for that use case due to existing language support. My hope would be that this proposal as it evolves would support both.

What we need is simply a maximally efficient way to exchange buffers of bytes, and a way for languages to agree on the format. There is no need to fix this to one particular serialization system. If Cap'n Proto is the most suitable for this purpose, it can arise as a common default organically, rather than being mandated by wasm.

I am of course biased, as I made FlatBuffers, which is similar to Cap'n Proto in efficiency, but more flexible and more widely supported. I however would not recommend this format to be mandated by wasm either.

There are many other formats that could be preferable to these two given certain use cases.

Note that both Cap'n Proto and FlatBuffers are zero copy, random access, and are efficient at nesting formats (meaning a format wrapped in another is not less efficient than not being wrapped), which are the real properties to consider for inter-language communication. You could imagine an IDL that allows you to specify very precise byte layouts for a buffer, including "the following bytes are Cap'n Proto schema X".

While I am subtly self-promoting, I might point people at FlexBuffers which is kind of like schema-less FlatBuffers. It has the same desirable zero-copy, random access and cheap nesting properties, but can allow languages to communicate without agreeing on a schema, without doing codegen, similar to how JSON is used.

@aardappel

What we need is simply a maximally efficient way to exchange buffers of bytes, and a way for languages to agree on the format. There is no need to fix this to one particular serialization system. If Cap'n Proto is the most suitable for this purpose, it can arise as a common default organically, rather than being mandated by wasm.

I understand the implicit point, that wasm shouldn't be used as a way to impose one standard other its competitors, and I'm personally indifferent to which IDL gets picked.

That said, when all is said and done, the rubber needs to meet the road at some point. If wasm wants to facilitate inter-language communication (which, granted, isn't an assumption everyone shares), then it needs a standard format that can express more than "these bytes make up numbers". That format can be capnproto, C structures, flatbuffers or even something specific to wasm, but it can't be a subset of all of these at the same time, for the reasons @fgmccabe outlined.

While I am subtly self-promoting, I might point people at FlexBuffers which is kind of like schema-less FlatBuffers. It has the same desirable zero-copy, random access and cheap nesting properties, but can allow languages to communicate without agreeing on a schema, without doing codegen, similar to how JSON is used.

I see the appeal, I don't think this is what you want most of the time, when writing a library. The problem with JSON (aside from the terrible parse time) is that when you write import a JSON object somewhere in your code, you end up writing lots of sanitizing code before you can use your data, eg:

assert(myObj.foo);
assert(isJsonObject(myObj.foo));
assert(myObj.foo.bar);
assert(isString(myObj.foo.bar));
loadUrl(myObj.foo.bar);

with potential security vulnerabilities if you don't.

See also 6 - Compile-time error handling above.


@bitwalker

Right, I didn't really consider the possibility of shared linear memory. I'd need someone more familiar with webassembly design than me (@lukewagner ?) to discuss how feasible it is, and whether it's a good way to achieve inter-module calls; it would also depend on how many assumptions FFIs rely on that are invalidated by wasm's memory layout.

For instance, FFIs will often rely on the fact that their host language uses the C library, and give native libraries access to the malloc function directly. How well can that strategy be translated to wasm, in the context of two mutually suspicious modules?

I guess I should say something on this thread, as the creator of Cap'n Proto, but weirdly enough, I haven't found that I have much of an opinion. Let me express a few adjacent thoughts that may or may not be interesting.

I am also the tech lead of Cloudflare Workers, a "serverless" environment that runs JavaScript and WASM.

We've been considering supporting Cap'n Proto RPC as a protocol for workers to talk to each other. Currently, they are limited to HTTP, so the bar is set quite low. :)

In Workers, when one Worker calls another, it is very commonly the case that both run on the same machine, even in the same process. For that reason, a zero-copy serialization like Cap'n Proto obviously makes a lot of sense, especially for WASM Workers since they operate on linear memory that could, in theory, be physically shared between them.

A second, less-well-known reason we think this is a good fit is the RPC system. Cap'n Proto features a full object capability RPC protocol with promise pipelining, modeled after CapTP. This makes it easy to express rich, object-oriented interactions in a secure and performant way. Cap'n Proto RPC is not just a point-to-point protocol, but rather models interactions between any number of networked parties, which we think will be a pretty big deal.

Meanwhile in WASM land, WASI is introducing a capability-based API. It seems like there could be some interesting "synergy" here.

With all that said, several design goals of Cap'n Proto may not make sense for the specific use case of FFI:

  • Cap'n Proto messages are designed to be position-independent and contiguous so that they can be transmitted and shared between address spaces. Pointers are relative, and all objects in a message need to be allocated in contiguous memory, or at least a small number of segments. This significantly complicates the usage model compared to native objects. When using FFI within the same linear memory space, this overhead is wasted, as you could be passing native pointers to loose heap objects just fine.
  • Cap'n Proto messages are designed to be forwards- and backwards-compatible between schema versions, including the ability to copy objects and sub-trees losslessly without knowing the schema. This requires some light type information be stored directly in the content, which Cap'n Proto encodes as metadata on every pointer. If two modules communicating over an FFI are compiled at the same time, then there is no need for this metadata.
  • Cap'n Proto RPC's promise pipelining, path-shortening, and ordering guarantees make sense when there is non-negligible latency between a caller and a callee. FFI on a single CPU has no such latency, in which case the promise pipelining machinery probably just wastes cycles.

In short, I think when you have independently-deployed modules in separate sandboxes talking to each other, Cap'n Proto makes tons of sense. But for simultaneously-deployed modules in a single sandbox, it's probably overkill.

Thanks for the feedback!

Pointers are relative, and all objects in a message need to be allocated in contiguous memory, or at least a small number of segments. This significantly complicates the usage model compared to native objects. When using FFI within the same linear memory space, this overhead is wasted, as you could be passing native pointers to loose heap objects just fine.

I don't know how feasible a shared linear memory approach is for wasm (see above).

That said, either way, I don't think the overhead from relative pointers would be that bad. WebAssembly already uses offsets relative to the start of linear memory, and implementations have tricks to optimize the ADD instructions away in most cases (I think), so the overhead of using relative pointers could probably be optimized away as well.

Cap'n Proto messages are designed to be forwards- and backwards-compatible between schema versions, including the ability to copy objects and sub-trees losslessly without knowing the schema. [...] If two modules communicating over an FFI are compiled at the same time, then there is no need for this metadata.

I don't think that's true. Having a way for modules to define backwards-compatible types at their boundaries allows wasm to use a dependency tree model, while mostly avoiding Haskell's dependency diamond problem.

A bigger source of pointless overhead would be the way capnproto xors its variables against their default values, which is useful when zero-bytes are compressed away, but counter-productive in zero-copy workflows.

I don't know how feasible a shared linear memory approach is for wasm (see above).

Ah, TBH I don't think I have enough context to follow that part of the discussion. If you don't have a shared address space then, yes, Cap'n Proto starts to make a lot of sense.

I'm happy to provide advice on how to design formats like this. FWIW there's a few little things I'd change in Cap'n Proto if I didn't care about compatibility with apps that already exist today... it's mostly, like, low-level pointer encoding details, though.

A bigger source of pointless overhead would be the way capnproto xors its variables against their default values, which is useful when zero-bytes are compressed away, but counter-productive in zero-copy workflows.

A bit off-topic, but the XOR thing is an optimization, not overhead, even in the zero-copy case. It ensures that all structures are zero-initialized, which means you don't have to do any initialization on object allocation if the buffer is already zero'd (which it often would be anyway). An XOR against a compile-time constant probably costs 1 cycle whereas any kind of memory access will cost much more.

@lukewagner Any thoughts on the "sharing linear memory" part?

I think there are use cases for both sharing and not sharing linear memory and ultimately tools need to support both:

Sharing makes sense where a native app today would use static or dynamic linking today: when all the code being combined is fully trusted and its combination is all using either the same toolchain or using a rigorously-defined ABI. It's more a more fragile software-composition model, though.

Not sharing memory makes sense for a more-loosely coupled collection of modules, where classic Unix-style design would put the code into separate processes connected by pipes. Personally, I think this is the more exciting/futuristic direction for a more compositional software ecosystem and so I've advocated for this to be the default for any toolchain aimed at participating in the ESM/npm ecosystem via ESM-integration (and indeed that is the case today with Rust's wasm-pack/wasm-bindgen). Using a mechanism in the general vicinity of Web IDL Bindings or the extension you've proposed makes a lot of sense to me as a form of efficient, ergonomic, typed (sync or async) RPC.

Having finally read this in full, it sounds a lot like my thinking in this area (which this comment box is too short to contain?).

In particular I've been thinking about the inter-module communication problem as being best described with a schema. Which is to say, we don't need the Cap'nProto serialization format, we can just use the schema. I have no opinion about Cap'nProto's schema language specifically at this time.

From the WASI/ESM+npm perspective, a solution of this form makes the most sense to me. It's an abstraction over ABIs, without depending on a shared ABI. It essentially allows one to describe an interface with a schema-lang API, and call across these language boundaries with native-seeming ABIs on both ends, letting the host handle translation.

In particular, this does not subsume the use case for having more coordination with another module: if you know for sure that you can share an ABI, you can in fact just use an ABI, any ABI, whether that be C or Haskell. If you control and compile all the wasm in question, that's a much easier problem to solve. It's only when you get into the npm case where you're loading arbitrary unknown code and you don't know its source language, that something like having schema-level interop between modules becomes incredibly attractive. Because we can either use the LCD of wasm itself - which I predict will follow a similar arc to native libraries, and use the C ABI - or we can use the LCD of languages, encoded in the schema language. And the schema can be more flexible by making requirement 2) a soft requirement, e.g. it should be possible to convert from C to Rust to Nim efficiently, but C to Haskell having more overhead isn't a dealbreaker.

In particular I've been thinking about the inter-module communication problem as being best described with a schema. Which is to say, we don't need [a] serialization format, we can just use the schema.

I tend to agree with the former, but I'm not sure that the latter follows. Who implements the schema? Even if the host does the transporting, at some point you have to define what Wasm values/bytes are actually consumed/produced on both ends, and each module has to bring its own data into a form the host understands. There may even be multiple forms available, but still that isn't dissimilar from a serialisation format, just slightly more high-level.

it should be possible to convert from C to Rust to Nim efficiently, C to Haskell having more overhead isn't a dealbreaker.

Perhaps not, but you have to be aware of the implications. Privileging C-like languages means that Haskell wouldn't use this abstraction for Haskell modules, because of the overhead induced. That in turn means that it wouldn't participate in the same "npm" ecosystem for its own libraries.

And "Haskell" here is just a stand-in for pretty much every high-level language. The vast majority of languages are not C-like.

I don't claim to have a better solution, but I think we have to stay realistic about how efficient and attractive any single ABI or schema abstraction can be for the general population of languages, beyond the usual FFI-style of oneway interoperability. In particular, I'm not convinced that a pan-linguistic package ecosystem is an overly realistic outcome.

Privileging C-like languages means that Haskell wouldn't use this abstraction for Haskell modules, because of the overhead induced. That in turn means that it wouldn't participate in the same "npm" ecosystem for its own libraries.

And "Haskell" here is just a stand-in for pretty much every high-level language. The vast majority of languages are not C-like.

Could give some specific use cases? Ideally, existing libraries in Haskell or some other language that would be awkward to translate into a serialization schema?

I suspect that it will mostly come down to utility libraries vs business libraries. Eg containers, sorting algorithms, and other utilities relying on the language's generics won't translate well to wasm, but parsers, gui widgets, and filesystem tools will.

@PoignardAzur, it's not difficult to translate them, but it requires them to copy (serialise/deserialise) all arguments/results on both ends of each cross-module call. Clearly, you don't want to pay that cost for every language-internal library call.

In Haskell specifically you also have the additional problem that copying is incompatible with the semantics of laziness. In other languages it may be incompatible with stateful data.

Who implements the schema? Even if the host does the transporting, at some point you have to define what Wasm values/bytes are actually consumed/produced on both ends, and each module has to bring its own data into a form the host understands. There may even be multiple forms available, but still that isn't dissimilar from a serialisation format, just slightly more high-level.

The host implements the schema. The schema doesn't describe bytes at all, and lets that be an implementation detail. This is borrowing from the design of the WebIDL Bindings proposal, in which the interesting bit is in the conversions from C structs to WebIDL types. This sort of a design uses Wasm Abstract Interface Types (I suggest the acronym: WAIT) instead of WebIDL types. In the WebIDL proposal we don't need or want to mandate a binary representation of data when it's been "translated to WebIDL", because we want to be able to go straight from wasm to browser APIs without a stop in between.

Privileging C-like languages means that Haskell wouldn't use this abstraction for Haskell modules, because of the overhead induced.

Oh, agree 100%. I should have finished the example to make that more clear: Meanwhile, Haskell to Elm to C# can be similarly efficient (assuming they use wasm gc types), but C# to Rust may have overhead. I don't think there's a way to avoid overhead when jumping across language paradigms.

I think your observation is correct that we need to try avoiding privileging any languages, because if we fail to be sufficiently ergonomic + performant for a given language, they will not see as much value in using the interface, and thus not participate in the ecosystem.

I believe that by abstracting over the types and not specifying a wire format, we're able to give much more leeway to hosts to optimize. I think a non-goal is to say "C-style strings are efficient", but it is a goal to say "languages that [want to] reason about C-style strings can do so efficiently". Or, no one format should be blessed, but certain compatible call chains should be efficient, and all call chains should be possible.

By call chains I mean:

  1. C -> Rust -> Zig -> Fortran, efficient
  2. Haskell -> C# -> Haskell, efficient
  3. C -> Haskell -> Rust -> Scheme, inefficient
  4. Java -> Rust, inefficient

And "Haskell" here is just a stand-in for pretty much every high-level language. The vast majority of languages are not C-like.

Yes, that was my intent behind using Haskell as a concrete language. (Although Nim was probably a bad example of a C-like language because it makes heavy use of GC too)

--

Another way I've been thinking about the abstract types is as an IR. In the same way that LLVM describes a many-to-one-to-many relationship (many languages -> one IR -> many targets), wasm abstract types can mediate a many-to-many mapping, of languages+hosts -> languages+hosts. Something in this design space takes the N^2 mapping problem and turns it into an N+N one.

The host implements the schema.

Well, that can't be enough, each module has to implement something so that the host can find the data. If the host expects C layout then you have to define this C layout, and every client has to marshal/unmarshal to/from that internally. That isn't all that different from a serialisation format.

Even if we did that, it's still useful to define a serialisation format, e.g., for applications that need to transfer data between single engines, e.g. via networking or file-based persistence.

Well, that can't be enough, each module has to implement something so that the host can find the data. If the host expects C layout then you have to define this C layout

The host shouldn't expect anything, but needs to support everything. More concretely, using the webidl-bindings proposal as an illustrative example, we have utf8-cstr and utf8-str, which take i32 (ptr) and i32 (ptr), i32 (len) respectively. There's no need to mandate in the spec "the host internally represents this as C-strings" to be able to concretely map between them.
So, each module implements something, yes, but the representation of the data doesn't need to be expressed in the abstract data/schema layer, which is how this gives us the property of abstracting over that data layout.
Additionally, this is extensible at the bindings layer that maps between concrete wasm types and abstract intermediate types. To add Haskell support (which models strings as both cons lists of chars and arrays of chars), we can add utf8-cons-str and utf8-array-str bindings, which expect (and validate) wasm types of (using current gc proposal syntax) (type $haskellString (struct (field i8) (field (ref $haskellString)))) and (type $haskellText (array i8)).

Which is to say, each module decides how the data originates. The abstract types + bindings allow for conversions between how modules view the same data, without blessing a single representation as being somehow canonical.

A serialization format for (a subset of) the abstract types would be useful, but can be implemented as a consumer of the schema format, and I believe is an orthogonal concern. FIDL I believe has a serialization format for the subset of types that can be transferred across the network, disallows materializing opaque handles, while permitting opaque handles to transfer within a system (IPC yes, RPC no).

What you're describing is pretty close to what I had in mind, with one big caveat: the schema must have a small, fixed number of possible representations. Bridging between different representations is a N*N problem, which means the number of representations should be kept small to avoid overburdening VM writers.

So adding Haskell support would require using existing bindings, not adding custom bindings.

Some possible representations:

  • C-style structs and pointers.
  • Actual capnproto bytes.
  • GC classes.
  • Closures serving as getters and setters.
  • Python-style dictionaries.

The idea being that while each language is different, and there are some extreme outliers, you can fit a fairly large number of languages in a fairly small number of categories.

So adding Haskell support would require using existing bindings, not adding custom bindings.

Depends on the level of granularity of existing bindings you're thinking of. N<->N languages encoding each possible binding is 2*N*N, but N<->IR is 2*N, and further if you say N<->[common binding styles]<->IR, where the number of common formats is k, you're talking 2*k, where k < N.

In particular, with the scheme I describe, you get Scheme for free (it would reuse utf8-cons-str). If Java models strings as char arrays as well, that's a utf8-array-str binding. If Nim uses string_views under the hood, utf8-str. If Zig conforms to the C ABI, utf8-cstr. (I don't know Java/Nim/Zig's ABIs, so I didn't mention them as concrete examples earlier)

So, yes we do not want to add a binding for each possible language, but we can add a few bindings per IR type to cover the vast majority of languages. I think the space for disagreement here is, how many bindings is "a few", what's the sweet spot, how strict should the criteria be for whether we support a language's ABI?
I don't have specific answers to these questions. I'm trying to give lots of concrete examples to better illustrate the design space.

Also I would assert that we absolutely want to specify multiple bindings per abstract type, to avoid privileging any one style of data. If the only binding we expose to Strings is utf8-cstr, then all non-C-ABI -having languages have to deal with that mismatch. I'm ok with increasing VM writing complexity some not-small factor.
The total work in the ecosystem is O(VM effort + language implementation effort), and both of those terms scale in some way with N=number of languages. Let M=number of embedders, k=number of bindings, and a=average number of bindings a given language needs to implement, with a<=k. At a minimum we have M+N separate wasm implementations.
Naive approach, with every N language independently implementing ABI FFI with every other N language, is O(M + N*N). This is what we have on native systems, which is a strong signal that any O(N*N) will lead to results not different from native systems.
Second naive approach, where every VM needs to implement all N*N bindings: O(M*N*N + N), which is clearly even worse.
What we're trying to propose is that we have k bindings that map between an abstract language, that maps back to all languages. This implies k work for each VM. For each language, we only need to implement a subset of the bindings. The total work is M*k + N*a, which is O(M*k + N*k). Note that in the event that k=N that the VM side is "only" M*N, so for any given VM it's "only" linear in the number of languages. Clearly though we want k << N, because otherwise this is still O(N*N), no better than the first solution.
Still, O(M*k + N*k) is much more palatable. If k is O(1), that makes the whole ecosystem linear in the number of implementations, which is our lower bound on the amount of effort involved. A more likely bound is k being O(log(N)), which I'm still pretty satisfied with.

Which is a long way of saying, I'm completely ok with increasing the VM complexity for this feature by some constant factor.

we can add a few bindings per IR type to cover the vast majority of languages.

This is the crucial underlying assumption which I believe is simply not true. My experience is that there are (at least!) as many representation choices as there are language implementations. And they can be arbitrarily complicated.

Take V8, which alone has a few dozen(!) representations for strings, including different encodings, heterogeneous ropes, etc.

The Haskell case is far more complicated than you describe, because lists in Haskell are lazy, which means that for every single character in a string you might need to invoke a thunk.

Other languages use funny representations for the length of a string, or don't store it explicitly but require it to be computed.

These two examples already show that a declarative data layout doesn't cut it, you'd often need to be able to invoke runtime code, which in turn might have its own calling conventions.

And that's just strings, which are a fairly simple datatype conceptually. I don't even want to think about the infinite number of ways in which languages represent product types (tuples/structs/objects).

And then there is the receiving side, where you'd have to be able to create all these data structures!

So I think it is entirely unrealistic that we would ever get even remotely close to supporting the "vast majority of languages". Instead, we would start to privilege a few, while already growing a large zoo of arbitrary stuff. That seems fatal on multiple levels.

My experience is that there are (at least!) as many representation choices as there are language implementations. And they can be arbitrarily complicated.

I completely agree. I think trying to design types that will somehow cover most language's internal representations of data is simply not tractable, and will make the eco-system overly complicated.

In the end there is only one lowest common denominator between languages when it comes to data: that of the "buffer". All languages can read and construct these. They're efficient and simple. Yes, they favor languages that are directly able to address their contents, but I don't think that is an in-equality that is solved by promoting (lazy) cons cells to the same level of support somehow.

In fact, you can get very far with just a single data type: the pointer + len pair. Then you just need a "schema" that says what is in those bytes. Does it promise to be conforming to UTF-8? Is the last byte guaranteed always 0? Are the first 4/8 bytes length/capacity fields? Are all these bytes little endian floats that can be sent straight to WebGL? Are these bytes maybe an existing serialization format's schema X? etc?

I'd propose a very simple schema specification that can answer all these questions (not an existing serialization format, but something more low level, simpler, and specific to wasm). It then becomes the burden of each language to efficiently read and write these buffers in the format specified. Layers in between can then pass around the buffers blindly without processing, either by copy, or where possible, by reference/view.

This is the crucial underlying assumption which I believe is simply not true. My experience is that there are (at least!) as many representation choices as there are language implementations. And they can be arbitrarily complicated.

I agree that this is the crucial underlying assumption. I disagree that it is not true, though I think because of a semantic nuance that I don't think I've made clear.

The bindings are not meant to map to all language representations perfectly, they only need to map to all languages well enough.

That is a crucial underlying assumption that makes this at all tractable, at all, regardless of encoding. @aardappel's proposal of going in the other direction and actually reifying the bytes into a buffer that is decodable is also built on the assumption that it is a lossy encoding of any given program's semantics, some more lossy than others.

The Haskell case is far more complicated than you describe, because lists in Haskell are lazy, which means that for every single character in a string you might need to invoke a thunk.

I had actually forgotten that, but I don't think it matters. The goal is not to represent Haskell Strings while preserving all of their semantic nuances across a module boundary. The goal is to convert a Haskell String to an IR String, by value. This necessarily involves computing the whole string.

These two examples already show that a declarative data layout doesn't cut it, you'd often need to be able to invoke runtime code, which in turn might have its own calling conventions.

The way to model that, regardless of how we specify bindings (or even IF we specify anything for bindings), is to handle that in userland. If a language's representation of a type does not map to a binding directly, it will need to convert to a representation that does. For example if Haskell's Strings are really represented as (type $haskellString (struct (field i8) (field (func (result (ref $haskellString)))))), it will either need to convert to a strict string and use a Scheme-like binding, or to a Text array and use a Java-like binding, or to a CFFIString and use a C-like binding. The value proposition of having multiple imperfect binding types is that some of those are less awkward for Haskell than others, and it's possible to construct Wasm-FFI types without needing to modify the compiler.

And that's just strings, which are a fairly simple datatype conceptually. I don't even want to think about the infinite number of ways in which languages represent product types (tuples/structs/objects).
And then there is the receiving side, where you'd have to be able to create all these data structures!

I'm confused, I see that as saying "binding between languages is completely impossible so we shouldn't try at all," but I believe what you've been saying has been more along the lines of "I don't believe the approach described here is tractable," which seems much more reasonable. In particular my objection to this line of argument is that it does not describe a path forward. Given that this problem is "very hard", what DO we do?

Instead, we would start to privilege a few

Almost-surely. The question is one of, what is the degree to which the few optimally-supported languages are privileged? How much leeway do the underprivileged languages have in finding an ergonomic solution?

while already growing a large zoo of arbitrary stuff.

I'm not sure what you mean by that. My interpretation of what's arbitrary is "which languages do we support", but that's the same as "privileging a few", which would be double-counting. And thus this would only be fatally flawed on that one level, rather than multiple :D

@aardappel the short version is that that's my backup plan if the declarative abstract approach fails: go the totally opposite direction and describe a serialization format. The observation is made that the Web itself is built almost entirely on Text, because it is an extremely low common denominator. Text is trivially understandable by all tools, and so something like the Web is possible on top of it.

The biggest concern I have with data-in-buffers that might make that approach intractable is how can we handle reference types? The best idea I have there is to share tables and serialize indices into them, but I don't have a full picture of how well that would actually work.

@jgravelle-google maybe reference types should be kept seperate? So a given functions raw signature might be ref ref i32 i32 i32 i32 which actually is 2 anyrefs followed by 2 buffers of a particular type (specified in the hypothetical schema above).

(As an aside, I'm unfamiliar with Haskell, but the idea of string as lazy lists of chars blow my mind. When are linked list of bytes ever the most efficient or convenient way to do anything? I get that Haskell needs everything to be immutable, and that linked lists allow for cheap prepending, but you can get and manipulate immutable strings without using linked lists.)

The biggest concern I have with data-in-buffers that might make that approach intractable is how can we handle reference types? The best idea I have there is to share tables and serialize indices into them, but I don't have a full picture of how well that would actually work.

That's one of the reasons I proposed capnproto as an encoding. Reference tables are more or less built in.

In any case, we'd want any format we chose to have reference types as first-class citizens, that can be placed anywhere in the data graph. (eg in optionals, arrays, variants, etc)

Thanks everyone for your feedback.

I think we're starting to hit the point where we're mostly retreading the same arguments over again with very little variation, so I'm going to update the proposal and try to address everyone's concerns. I'll probably start over with a new issue once I'm done writing the updated proposal.

To summarize the feedback so far:

  • There is very little consensus over what serialization format, if any, would be the best for wasm. Alternatives include FlatBuffers, C ABI struct graphs with raw pointers, a tailor-made wasm IDL format, or some combination of the above.

  • The proposal needs stronger negative space. Multiple readers where confused by the scope of the proposal, and which use cases it was meant to facilitate (static vs dynamic linking, module-to-host vs host-to-host, sharing mutable data vs passing immutable messages).

  • @lukewagner has expressed some enthusiasm for the potential of a module system connecting mutually distrusting modules, when combined with ESM integration. The proposal's next iteration should expand on that potential; in particular, I believe that having a backwards-compatible type system would allow wasm to use a npm-like dependency tree model, while avoiding the brunt of the dependency diamond problem.

  • There has been little feedback on the subject of capabilities, that is, opaque values that can be returned and passed, but not created from raw data. I take it as a sign that the next iteration should have a lot more emphasis on them.

  • Several readers expressed concerns about the feasibility of an inter-language type system. These concerns are somewhat vague and hard to define, in part because the subject matter is very abstract, in part because the proposal so far is pretty vague itself, which echoes @lukewagner's chicken-and-egg problem. Specific failure states include:

    • Focusing too much on highly visible languages, leaving more niche languages behind.
    • Having a leaky abstraction that tries to be too general, but doesn't cover anybody's use case conveniently or efficiently.
    • Covering too many cases, creating a bloated N*N implementation that still suffers from the above problems.

The next proposal iteration needs to address these concerns, one way or another.

In particular, I think the discussion would benefit a lot from some strawman examples to reason around. These examples would include at least two libraries, written in different languages with different data layouts (eg C++ and Java), being consumed by at least two minimal programs in different languages (eg Rust and Python), to illustrate the n*n problem and strategies to address it.

Also, as readers have pointed out, the proposal currently entangles the idea of a type schema with the idea of its representation. While the proposal does a fair job of laying out the requirements of a representation format, it needs to lay out the requirements of an abstract type schema first.

Anyway, thanks again to everyone who's participated in this discussion so far. I'll try to come up with a more thorough proposal as soon as possible. If anybody here is interested in helping me write it, be sure to email me!

@jgravelle-google:

The way to model that, regardless of how we specify bindings (or even IF we specify anything for bindings), is to handle that in userland.

Yes, I agree, and my argument is similar to @aardappel's: if that's what we generally have to do anyway then we should simply accept that and don't try ad-hoc things to improve some odd cases. Userland is where the conversion belongs, in the spirit of everything else in Wasm.

I'm confused, I see that as saying "binding between languages is completely impossible so we shouldn't try at all,"

I think it's totally desirable to define a DDL (type scheme) for data interop between languages. I just don't think it's tractable to build conversions into Wasm. The conversions need to be implemented in userland. The binding layer just prescribes a format that the user code has to produce/consume.

while already growing a large zoo of arbitrary stuff.
I'm not sure what you mean by that. My interpretation of what's arbitrary is "which languages do we support", but that's the same as "privileging a few", which would be double-counting.

Sorry, I meant that I suspect that there won't be anything terribly canonical about these conversions. So both their selection is "arbitrary" and their individual semantics.

how can we handle reference types?

Ah, that's a good question. FWIW, we are trying to address this very problem right now for an IDL/DDL for the Dfinity platform. As long as there's only anyref, the solution is fairly simple: the serialisation format defines two pieces, a memory slice projecting the transparent data and a table slice projecting the references contained. Multiple reference types necessitate multiple table slices accordingly. The tricky question is what to do once the set of ref types is no longer finite (e.g. with typed function references).

Once we have GC types there should be an alternative way to supply the data, which is as a GC value. In that case references are a non-issue, because they can be freely mixed in.

@PoignardAzur:

As an aside, I'm unfamiliar with Haskell, but the idea of string as lazy lists of chars blow my mind.

Yeah, I believe it's widely considered a mistake nowadays. But it demonstrates how much diversity there is even for "simple" data types.

@rossberg

I think it's totally desirable to define a DDL (type scheme) for data interop between languages. I just don't think it's tractable to build conversions into Wasm. The conversions need to be implemented in userland.

I agree, and to add to that: I'm skeptical about adding something to the wasm spec for this because I don't think wasm has a greater need for an inter-language solution than other platforms, and I don't think wasm has a greater ability to implement such a solution than other platforms. There's nothing obviously special to me about wasm here, and so I'm not sure why we can do better than standard solutions in this area, e.g. buffers as @aardappel mentioned. (But I do think experimentation in userspace is very interesting, as it is on all platforms!)

The one special thing wasm has, at least on the Web, are the JavaScript/Web API types for strings and arrays and so forth. Being able to interact with them is obviously important.

I don't think wasm has a greater need for an inter-language solution than other platforms

I think it does. Being used on the web by default means that code can and will be ran in different contexts. In the same way that one might <script src="http://some.other.site/jquery.js">, I would love to see people combining wasm libraries in a cross-origin way. Because of the ephemerality and composability properties that the web provides, the value-add of being able to interface with a foreign module is higher than it's ever been on native systems.

and I don't think wasm has a greater ability to implement such a solution than other platforms.

And I think it does. Because wasm is ran by an embedder / in a host, the code generation is effectively abstracted over. Because of that, a VM has way more tools + leeway to support higher-level constructs that aren't possible on native systems.

So I think that something in this space is more valuable and more possible than on other systems, so that's why wasm is special in this context. To me the JS interop is a special case of the more general notion that wasm modules need to be able to speak to external things with very different views of the world.

A path forward for this is to push this entirely into tool-level interop for now, and defer standardizing until we have a winning format. So if the goal is to have the predominant wasm package manager's ecosystem using a given interface format, (and is that NPM or WAPM or some as-yet-created package manager?) then that can happen independently of standardization. In theory we can standardize what people are already doing, in order to get better performance, but the ergonomics can be implemented in userland. A risk there is that the winning interlanguage format doesn't lend itself towards optimization, and we wind up with a suboptimal de facto standard. If we can design a format with the intention of standardizing later (declarative style in a custom section is mostly-sufficient?) that removes that risk, but also delays any performance improvements. For me performance is one of the less exciting motivators to have this sort of thing so I'm pretty ok with that, though others may disagree.

(and is that NPM or WAPM or some as-yet-created package manager?)

I think it's way too soon for WAPM to be a viable package manager. We need features like ESM integration, WASI and some form of inter-language bindings to be standardized before a wasm package manager becomes feasible.

As it is, I don't think WAPM even has dependency management.

I don't think wasm has a greater need for an inter-language solution than other platforms

I think it does. Being used on the web by default means that code can and will be ran in different contexts. In the same way that one might

Related issues

JimmyVV picture JimmyVV  ·  4Comments

badumt55 picture badumt55  ·  8Comments

thysultan picture thysultan  ·  4Comments

bobOnGitHub picture bobOnGitHub  ·  6Comments

nikhedonia picture nikhedonia  ·  7Comments