Design: Disallow duplicate imports?

Created on 3 Mar 2021  ·  44Comments  ·  Source: WebAssembly/design

(This issue captures the presentation at CG-03-02, with the goal of getting more feedback in preparation for a poll at some future CG meeting.)

The Module Linking proposal proposes to recast two-level imports as sugar for single-level imports of instances. For example:

(module
  (import "a" "foo" (func))
  (import "a" "bar" (func))
)

would be desugared to the same AST as:

(module
  (import "a" (instance
    (export "foo" (func))
    (export "bar" (func)))
)

One problem with this is that currently wasm allows duplicate imports (with the same or different signatures):

(module
  (import "a" "b" (func))
  (import "a" "b" (func))
)

while duplicate exports are disallowed. Thus, desugaring the above to:

(module
  (import "a" (instance
    (export "b" (func))
    (export "b" (func)))
)

would not be valid. module-linking/#7 discusses various ways to reconcile this discrepancy, but the simplest solution would be to retroactively disallow duplicate imports.

Bug 1647791 added telemetry to measure duplicate imports in Firefox, and the results for Firefox Beta 86 show 0.00014% of page loads contain duplicate imports (which may just be unit tests in automation or other synthetic sources). Thus, it seems like this is a breaking change we could practically make.

Another principled reason to disallow duplicate imports is that duplicate imports with differing signatures can only be implemented by the host which breaks the general goal of allowing wasm to always be able to virtualize (polyfill, mock, etc) wasm imports.

On concrete bit of experience is that the spec test suite used to have duplicate imports (for different "overloads" of the print function), but this was removed because it caused problems for people wanting to run the tests.

Thus, it could be argued that this restriction is generally a good one for the wasm ecosystem, even setting aside the concerns of the Module Linking proposal.

Most helpful comment

Duplicate imports are quite useful to call imports from dynamic languages like JS, where the same function can have various signatures, and I'd expect it to be used rather more than less once the type system at the boundary becomes richer, say one can call the same function with a number or a string argument, or with one or two arguments. I feel like removing this feature would be very unfortunate for interop with JS.

All 44 comments

What bad things would happen in the module linking proposal if we did not make this change? Would it just mean this desugaring wouldn't quite work and the spec description of imports would therefore be more complicated? Or would it interfere with the functionality of the module linking proposal more fundamentally? (I'm not at all opposed to this change, I'm just trying to understand the motivation better.)

As I remember from Luke's presentation, we currently disallow duplicate exports.

What if we lift this restriction and allow duplicate exports? That opens the design space for module linking to solve this problem another way.

Currently, engines only have to inspect import/export names for two reasons: UTF-8 well-formedness, and the duplicate export check. We've talked offline line about lifting the UTF-8 restriction, but maybe we could lift both.

Duplicate exports would make name-based linking ambiguous.

@tlively If we don't disallow duplicate imports, then the alternative solutions we discussed in module-linking/#7 seem like they would add significant complexity to the implementation of the validation and linking, particularly around asking whether a given module A at instantiation time is acceptable for import from module B.

@titzer I think adding duplicate exports would end up creating a bunch of hard questions for each source-language integration. E.g., in the JS API: does a duplicate export show up as a single JS function object that does dynamic overload resolution? Web IDL does this and it's quite complicated. What happens if I try to import this JS function from wasm (via WebAssembly.instantiate)? I'm not saying it's an impossible design problem, but if there is an option to avoid these hard/messy questions altogether for N language integrations, that seems preferable.

Duplicate imports are quite useful to call imports from dynamic languages like JS, where the same function can have various signatures, and I'd expect it to be used rather more than less once the type system at the boundary becomes richer, say one can call the same function with a number or a string argument, or with one or two arguments. I feel like removing this feature would be very unfortunate for interop with JS.

@lukewagner, this might be a bad idea due to my lack of familiarity with the module-linking proposal, but wouldn't another option be to keep MVP imports and instance imports distinct? If they were not unified, MVP imports could still be duplicated and there would be no need for complicated intersection types.

Duplicate imports are quite useful to call imports from dynamic languages like JS, where the same function can have various signatures, and I'd expect it to be used rather more than less once the type system at the boundary becomes richer, say one can call the same function with a number or a string argument, or with one or two arguments. I feel like removing this feature would be very unfortunate for interop with JS.

This. A thousand times this. I'm the primary developer on a project that provides a testing framework for assemblyscript developers, and I see this as probably one of the most important parts of web assembly at this time.

The 0.00014% includes as-pect, the test runner for AssemblyScript by @jtenner, which is a critical piece of the growing AssemblyScript ecosystem.

Duplicate imports are quite useful to call imports from dynamic languages like JS, where the same function can have various signatures,

If there will at least be an alternative for this, then that will be great, because the origin of WebAssembly is in browsers with JavaScript. Let's keep things easy for that target.

@tlively That's one of the possible alternative options, yes, but it has the unfortunate consequence of having two ways to do roughly same thing, which in general is something that's nice to avoid, and it also creates more cases for the implementation to care about during validation, subtype checking, and instantiation.

@jtenner @trusktr To be clear, you could still import the same JS function multiple times, the only difference is that each distinct signature you imported it with would need a different name. So, e.g., instead of emitting:

(module
  (import "env" "log" (func (param i32)))
  (import "env" "log" (func (param f32)))
  (import "env" "log" (func (param i64)))
  (import "env" "log" (func (param f64)))
  ...
)

instantiated via:

WebAssembly.instantiate(module, { env: { log:log } });

you could emit:

(module
  (import "env" "log_i32" (func (param i32)))
  (import "env" "log_f32" (func (param f32)))
  (import "env" "log_i64" (func (param i64)))
  (import "env" "log_f64" (func (param f64)))
  ...
)

instantiated with:

WebAssembly.instantiate(module, { env: { log_i32:log, log_f32:log, log_i64:log, log_f64:log } })

Is that a possible change you could make or would it be prohibitive in some way?

Is that a possible change you could make or would it be prohibitive in some way?

Prohibitive? No. If all I have to do is change the name of the function I'm linking against, this is fine. Annoying, but fine. I just want to make sure that I'm properly heard.

Let's keep things easy for that target.

JavaScript devs will be most affected by this change. And since we are "web assembly," we should be very careful how we treat web devs in regards to future issues like this.

I'm all for change if it makes things better. So I have the following questions in order of what I would regard as most important to least:

  1. Is this making the web a better place for development?

As far as I can see? Maybe. It forces developers to be more explicit about imports, and discourages JavaScript developers to do less javascripty things. Maybe more restrictions leads to more creativity. A conversation that never seems to be had is how this affects C# and Java developers. I believe this might negatively affect C# devs writing web assembly compilers too, but perhaps this complexity is not so bad.

  1. Does this make WebAssembly easier to implement, maintain, and faster to run?

I don't know the answer to this. A compiler developer would be more versed in this sort of thing. My guess is it's more convenient for developers trying to maintain wasm engines. And yeah, it would be faster, or at least it will be more explicit, easier to validate module state.

  1. Is this change going to negatively impact developers who rely on the previously unchanged behavior?

As far as I can tell, the few people who are using this feature, their stuff will break, and they will be unhappy to have to go back and try to figure out how to fix their software.

After maintaining testing framework software for AssemblyScript, I've come to realize that little changes like this don't affect typical end developers. These changes affect people maintaining ecosystem software, development tools, and compilers. Subtle changes in specification like this can result in very large pull requests that take days or weeks to go over.

My final tl;dr opinion: I don't like this change, but if you're going to make the change, pull the band-aid off already. Start the chain reaction now, and make it count.

I see one more drawback of this:

(module
  (import "env" "log_i32_f32_f64" (func (param i32 f32 f64)))
  (import "env" "log_f32_i32_f64" (func (param f32 i32 f64)))
  ...
)

it may potentially increase binary size when you should keep original import names (for demangling) and can't apply binaryen's pass which compress import names.

However it's not too big problem for general case

The core problem I see is that a user has to know all the arities and argument types upfront to manually mangle the imports object, while currently a compiler can assume that for example Math is imported as a whole, and generate imports with the same name but different arities or argument types while encountering import calls during compilation. A simple example here is Math.max, slightly more complex ones are the Typed Array ctors with wildly differing signatures, and it only becomes more complex with for example Node.js APIs, which often allow omitting arguments in the middle of things while also allowing wildly different types, and ultimately user code of course, which one never knows about for sure. And of course, whenever something related in the Wasm module's sources changes, one will also have to revisit the imports object again. That'd all be very sad.

I think what I could not express in the CG meeting is that I think the module linking proposal is trying to push down requirements from name-based linking into the engine and it seems natural to me that there is a layer here so that those requirements do not get pushed down. For example, when we restrict the binary format of core wasm, we are declaring that engines must check and reject modules that do not conform. For type-safety, reasonable size limitations, and other things that matter to an engine, this makes perfect sense.

I would argue that names do not matter to engines; they only matter to host embeddings, and engines really shouldn't ever need to look at (much less reject) names. This is evidenced in the JS overloading situation other commenters mentioned.

My comment about UTF-8 was in that direction; an engine has literally no need for UTF-8 processing except to reject bad names; same for duplicate exports. This smelled extra funny doing Wizard, as I suddenly had to implement a UTF-8 validator from scratch...hmm. Also, duplicate export checking is the only place Wizard uses a hashtable. Not that a hundred or so lines of code is a crisis, but it's an odd duck in an otherwise computation-oriented VM exercise.

I understand the goals of the module linking proposal are to provide a standard name-based linking solution for Wasm, but for a number of reasons I think that has some risk of not working in a general enough way that we should resort to banning things that are perfecting fine from an engine perspective, at the engine level.

So I stand by my earlier suggestion that we should rather be more liberal, and perhaps lift the restriction on duplicate exports.

If the module linking proposal wants to reject modules right now that cannot be linked because of naming problems, then that seems perfectly fine as a layer restriction above core wasm. But like I said, there's a risk the module linking proposal will not solve linking problems for very many programming languages and instead we'll need a more programmable solution that involves an API for first-class modules. But that's probably a longer conversation and this is a fairly narrow issue. That's the context of my thinking here.

Ah, interesting @titzer. I've been having similar thoughts. In short, compilation and linking are connected but still distinct, and this issue is a canary that they're being too closely coupled.

We could imagine taking all the linking-relates aspects of WebAssembly and putting it into a separate section. A "compilation" section would still have imports, but they're unnamed (beyond their index)—just holes that a "linking" section will take care of filling. A "compilation" section might still have exports, but again they're unnamed and a "linking" section is responsible for providing names.

Linking sections, then, could have embedder-specific content. For example, a linking section for Interface Types could provide adapter_funcs, whereas a linking section for JS could install rtt dictionaries as in @tebbi's WebAssembly/gc#132—in both cases, this additional content only makes sense for that particular embedder and is extremely useful for that particular embedder. The module-types proposal is essentially another linking embedder, and part of the tension is that what makes the most sense there (e.g. single names for simple link-resolution) is not what makes the most sense in JS (e.g. overloaded names).

Addendum: This could be useful for supporting jitting. Modules want their functions to be directly callable from additional modules they dynamically generate without also having to expose those functions to the embedded-linking system.

As @titzer said, this too is beyond the scope of this discussion, but I wanted to mix in my thinking to the idea as well.

@dcodeIO In all scenarios that I've seen, the JS code that instantiates a wasm module is generated by the same toolchain that generates the wasm module (b/c instantiation in general tends to have a bunch of glue code necessary), thus I'm not aware that this impl detail would bubble up to the user. But have you seen differently?

One thing I'd like to clarify and emphasize: from a purely JS ecosystem POV, duplicate imports are a hazard because both the JS API and ESM-integration have no way of wiring up two wasm instances A and B where B has a duplicate import that we want to satisfy with A. A may be able to export two functions with suitable signatures, but B will only be able to import 1 of them (with one signature, which won't in general be able to satisfy both imports).

@titzer I understand the desire to keep names out of core wasm, but to realize your goal, instead of importing a module as currently proposed:

(module
  (import "other" (module
    (import "foo" (func))
    (export "bar" (func))
  ))
)

we'd instead have to remove the "foo" and "bar" from the import type, leaving something purely positional:

(module
  (import "other" (module
    (import (func))
    (export (func))
  ))
)

If we look forward to separate-compilation scenarios (e.g., factoring out shared libraries into modules that are widely reused), then this would lead to a wasm version of the old C++ fragile base problem since we'd be depending entirely on positional order; this is what import/export names fix. Is that what you're proposing? (Note that first-class module references and/or first-class instances/instantiation all have this same issue; it's orthogonal.)

But if that's not what you're proposing, then I don't see how we can get away from names being an essential part of module linking and thus the ambiguities caused by duplicate imports/exports.

No, I am not proposing positional linking. I agree that that would be very fragile indeed.

But reflecting on the Jawa use case, and also on interface types, the thing that stands out is the need to actually parse and apply logic to imports. In Jawa, the import names are not strings but actually expressions in a language, and with import arguments, they could take types and functions as imports. There's no pure wasm solution to late binding with Java, or any high-level language IMHO. A low-level name-based linking solution might serve a niche wasm ecosystem built with a very limited set of languages, but it doesn't solve the use case of higher level languages in the long run. So IMO it's building a mechanism to enable the creation of new, very limited ecosystem, as opposed to adding a general enough mechanism to build linking at a higher level.

Thus my comments about the first-class modules. In the long run we are going to need a reflection and JITing API that will allow fully programmable linking and a very rigid name-based matching system doesn't really solve any language's real problems or give them enough rope.

And name-based matching is also very fragile, too. Because we don't have any parameterization mechanism for imports yet, languages have to resort to name mangling, which is both ugly and fragile, and cannot support even simple things like subtyping and overloading.

Sorry for the extended comments; we are really discussing a larger issue now and that should probably not on a narrow one as here.

In all scenarios that I've seen, the JS code that instantiates a wasm module is generated by the same toolchain that generates the wasm module (b/c instantiation in general tends to have a bunch of glue code necessary)

AssemblyScript for example does not generate module-specific glue code but tries to avoid doing so. It only emits type definitions (for development purposes) for use with a tiny general-purpose companion JS library (can demangle higher level export names for more convenient usage from JS) that we just call the "loader", but using it is often optional, especially when using AS mostly as a slightly nicer syntax to produce a minimal Wasm module, say for a compression algorithm (just C-like function exports for example), or when compiling for WASI. My hopes are that we'll eventually not need mangling of names, like MyClass#myMethod, anymore, but I'm not sure whether this relates to this issue.

duplicate imports are a hazard because both the JS API and ESM-integration have no way of wiring up two wasm instances A and B where B has a duplicate import that we want to satisfy with A

I would have expected that this case produces an error somewhere down the road, yeah, probably during linking of a Wasm export to a Wasm import with a signature mismatch that cannot be satisfied, while it would result in lots of fun behavior at runtime when using a mismatching signature with a JS import, or an import from any other dynamic language.

It's unclear to me why it's necessary to use the same name to import multiple Wasm functions with different signatures, since Wasm functions have fixed signatures (unlike JS host functions, which coerce all arguments and return types to the given signature). Have I misunderstood the premise?

It seems like it would be perfectly normal for this scenario to error out if the given import is a Wasm function, unless the standard were extended to allow specifying functions with different signatures invoking automatic coercion as well.

While I have originally been cautious of removing duplicate imports, I now strongly support this change.

The crucial observation is that duplicate imports cannot be handled in various environments, as @lukewagner points out. And as he says, that is problematic even for JS use cases.

Most importantly, it breaks a basic modular property, namely that all imports can be equally implemented by Wasm or by host modules. That is important for use cases like virtualisation.

Duplicate imports effectively force either an overloading or a dynamic typing mechanism on the client side, which is bad, because not every client has means to support that (including other Wasm modules).

And this is a real problem in practice, as the history of the test suite has demonstrated.

@titzer, I'm afraid I don't follow. Unambiguous naming seems like an orthogonal problem to parameterisation. Even if we had the latter, we still would want to make sure that names uniquely determine a (parameterised) entity. Probably even more so, since resolving overloaded import names by other means gets even more complicated when these other means get more complex?

Like I said, names don't matter to engines, so their uniqueness doesn't matter to engines either. They matter to _linkers_, so it's a particular linking solution's choice if it wants to enforce uniqueness on import names (or export names for that matter). And linkers in the limit will require a fully programmable solution in the end, which is not only first class modules and instances, but JITing and the ability to incrementally construct them.

I think all of the trouble is coming from trying to make modules into statically typed functors[*] from imports to exports based on names. The effect of that is that names become part of a type, and introducing that type into the language of wasm requires typechecking and therefore name matching rules. But the problem with that is that these names are low-level names, akin to mangled C/C++ names. The end result of that is three bad things:

  1. we saw already: wasm cannot even connect to host environments that do have overloading
  2. linking is constantly constrained by what can and cannot be expressed using low level names alone, leading to a constant design problem with every new wasm feature
  3. it doesn't serve languages that cannot use any low level name mangling scheme, which will need a programmable and JITing solution.

These problems follow directly from the desire to introduce a static module type, because that is motivated by getting engines to do name-based matching for a simple linking solution. But in reality, engines should never look at names; another thing--a linker--should look at names and should them resolve them for an engine. And yes, one way to solve that problem is that that API does provide vectors of imports to a linker and it returns a vector of exports--that's what I did in Wizard and that's how the C embedding API works.

[] I used to think this too, that wasm modules were functions from imports to exports. It's an elegant way to think about things, and @rossberg has explored that in an academic context for a long time. But that's missing something important: the logic for linking that comes from a higher level language. That linker is an entity that _processes_ a module and computes bindings based not only on names, but perhaps its own state, etc. Modules contain a *description of what they need from the outside world, but in the source language that the linker understands. A simple-string matching does _not_ suffice.

@titzer, I agree that the module linking proposal does not provide a powerful enough linking model for many use cases (it won't be able to replace C object files, for example), but I don't think providing a general purpose linking model is a goal of the proposal. I view it as an extension of the current string-based import/export system that is sufficient and useful for bundler-style linking that is common on the Web and that is already baked into Wasm's module structure to some extent. Higher-level language-specific linkers can continue to do whatever they want with respect to naming because their modularity must be layered on top of WebAssembly anyway.

@titzer If the names of imports and exports are not part of a module's type, then how are you proposing to avoid the fragility problem that you said you also wanted to avoid?

I also don't see how you're addressing the virtualization problem. Reading @rossberg's comment, I realize that virtualization is the more fundamental pre-existing problem to be solved here, and it actually has nothing to do with the Module Linking proposal.

I'm sympathetic to the change, but looking at WebAssembly/module-linking#7, there do not seem to be many options considered. There is a lot of discussion of intersection types, but that discussion does not mention that the transformations using intersection types do not work for coercive linking systems such as the JS API, which is what some of the use cases people have mentioned here rely upon. All the options considered seem to be based on surface-language module constructs (primarily based on Andreas's knowledge of modules for specifically functional languages), but there doesn't seem to be much consideration for low-level module constructs. A low-level module system would easily make the issue here moot while still also serving the goals of the module-linking proposal (including name-based resolution). At the same time, it would be easier to build upon for a number of important extensions.

@RossTate, I'm not sure what you are referring to as a "low-level module system" here or how it is different from what has already been discussed. If you have ideas for different solutions, perhaps they can be discussed on https://github.com/WebAssembly/module-linking/issues/7 to keep this issue focused on the particular proposed solution of disallowing duplicate imports.

I've left a note on https://github.com/WebAssembly/module-linking/issues/7#issuecomment-791063260 asking whether this can simply be part of the module linking validation without causing backwards compatibility issues for existing code that may be in the wild.

This is exactly the problem that took me so long to face. Modules do not have types without considering the import processor that gives meaning to the names and constraints associated with imports. It's like asking what the type of a text file is. Of course a text file can contain text written in any language, so that question doesn't make sense without context. When we fix the language, then we can even start to talk about types, if any, in a file. It's here again the same with modules. Modules are not functors, i.e. pure functions from imports (with names) to their exports. They require a linking system to give the imports meaning. Until we get past that point, none of the rest of what I'm saying makes any sense. I feel that is a very base assumption in the module linking proposal that hasn't even been surfaced for discussion. And I feel that we're not really questioning it. I seem to be repeating myself because I am actually questioning things deeper than might seem polite. I don't mean to be impolite, of course, but I do mean to challenge that modules need to look like functors. And we're not even questioning that.

I can give example after example as to why it's clearly not the case that module are functors.

  1. Linking wasm modules to JavaScript goes through an imports object--an imports object that can in fact be a proxy or otherwise have behavior to apply Turing-complete logic to names (e.g. decode them), and even has state, which makes it impossible to cast as a functor. It is literally defined operationally as an algorithm that interrogates a JavaScript object import by import. That entire linking ecosystem cannot be expressed with module linking.

  2. The C embedding API is the same way. It provides a module to the embedder as a fully inspectable entity, complete with names and kinds and signatures for inspection. To instantiate a module, an embedder can apply whatever logic they like, as long they simply provide bindings to the engine (yes, in a positional way) for instantiation. Also, an ecosystem that is fully programmable and not expressed with module linking.

  3. Both Jawa and Pywa are like this too. Jawa is fully statically typed, but its imports are encoded in the import language in a way where simple name matching is not adequate; the Jawa runtime must synthesize (JIT) new functions, types, and other bindings to provide to the engine. Pywa is fully dynamically typed and the JS implementation of it does use overloading in similar ways.

Modules and their imports only make sense in the context of the linking scheme, the language, the import processor, whatever what you want to call it. Again, they are not functors. They don't have types devoid of the context in which they are in. Modules imports are descriptions of what they need from the outside world that are written in a foreign language that is not, and cannot be, understood by the engine. They only have Wasm types so that the module itself can be separately typechecked independent of its imports according to Wasm's type system, and that imports can be checked against the assumptions spelled out in import constraints. Again, nothing about names appears here until we start trying to make modules into a wasm-y typed functor. But those names are purple! That's where all of the source language stuff survives. Of course fumbling at it with a string comparison is not going to work.

With that in mind, the desire to make a "no frills" linking solution is not a bad one. That makes sense when names aren't too crazy. It's like creating a little ecosystem where names are very simple and simple matching suffices. It just doesn't need to be be primitive, because it's not the right primitive. That should be clear because it's baking its assumptions directly into the core of wasm in a way that risks being a stumbling block, or an odd wart, when we get to what is really needed, which is a fully programmable linking API. If we had the programmable API for linking then the module linking proposal could just be a library. So we should rather be thinking about how to make the C embedding API into more of a standard wasm API and then write module linking as a library against that API.

@lukewagner The "fragility" problem: note I am not advocating discarding names. Names are obviously needed by the linking system. But they aren't needed by the engine. For the API between the engine and the linker, I just mean that the engine can pass the imports to the linker as an ordered vector, the linker can resolve (potentially a subset of them), and return them positionally. Modules themselves do not need to rely on positions.

And yeah, for this issue, there is an excellent use case for duplicate imports that I've been thinking about for a while, and yes duplicate imports up to their arguments: sharing (or not) inline caches. If we were to, e.g. import a function that performs a JavaScript property access, we might give it the name "GetProperty[foo]". If we import it just once, what inline cache should the engine use? Should it inspect wasm code to generate a new call site for them? Well, if a module can import the same function multiple times with exactly the same name, but get a new IC every time, then a module has a measure of control over how the engine adapts to its various polymorphism. (Of course, we can always stick some silly uniquifying junk on the end of the string, but why bother? It's just work around a limitation from a totally different thing that we aren't using in wasm.

@titzer The view you're laying is a rather significant departure from how wasm is specified, implemented and widely talked about today, so I think the burden is on you to propose this change of perspective to the CG and get some buy in--I don't think it's something you can assert as being the nature of wasm. I understand that this is what you want out of wasm for the JAWA project; perhaps instead what you want is to propose some new alternative packaging of wasm instructions into some new unit that matches your needs but isn't a "module" -- what you're talking about seems more like a "template" or "macro" or "blueprint" or something.

Independently, what I've learned to be the root question here remains: should wasm modules be able to virtualize wasm exports? Completely setting aside the question of "how do wasm modules get linked together?", this seems like a basic question of principles to put forward to the CG. I wonder if maybe it deserves a separate design issue to focus just on this question, independent of questions about linking?

Virtualization is important, and programmable linking should surely support that.

I actually don't think what I am talking about is very different from today's wasm, and what I am talking about is not just geared to Jawa. The same thing arises when thinking about linking .NET or other language runtimes. I gave three examples above, two of which are extant embeddings that, by virtue of being embedded in another language, give full programmability.

I am perfectly willing to put together a presentation for the CG.

Fwiw, by my understanding titzer's suggestions to make Wasm viable for more use cases rather than less is much more in-line with my expectation than the alternative. I would even go as far as to suggest that a more general discussion about Wasm's direction may be needed, given all the controversy that has already taken place in similar issues for similar reasons elsewhere.

For instance, I find a statement like

The view you're laying is a rather significant departure from how wasm is specified, implemented and widely talked about today, so I think the burden is on you to propose this change of perspective to the CG and get some buy in--I don't think it's something you can assert as being the nature of wasm.

deeply concerning in context. There may be a broader problem here in how wasm is specified, and it is not clear to me which side is departing.

@titzer I was mainly referring to this part of what you were saying (and all the corollaries):

Modules do not have types without considering the import processor that gives meaning to the names and constraints associated with imports. It's like asking what the type of a text file is.

which would be a significant departure from the current spec (e.g., this part).

@lukewagner Thanks for pointing me to that.

In that formulation, (as I think discussion on the module linking has pointed out), imports and exports are vectors, thus, positional. To bind the (positional) exports of one module to the (positional) imports of another module, you need a mapping function. I am just pointing out that the "no frills" exact name matching function is just one choice and we already have two other embeddings where that mapping function is fully programmable in an host language.

@titzer That's true, but notice: despite the fact that the JS API is fully programmable, if I give you a wasm module with duplicate imports (of differing types), you can't use the JS API to implement those imports with wasm exports due to the use of name lookup in WebAssembly.instantiate(). The same goes for ESM-integration. While I can imagine theoretical extensions to these two linking schemes to support duplicate import virtualization, I have a harder time imagining everyone getting motivated enough to actually specify and implement the extra complexity.

Thus, I think the question we have to ask is: "will the myriad of wasm linking schemes that will inevitably emerge across the wasm ecosystem end up supporting virtualization of duplicate imports?" and, based on our current sample set, it's looking unlikely.

Imagine that WebAssembly imports and exports had no strings associate with them. The JS API would use purely positional indices to access imports and exports. From what I can tell, it would be easy (in the relative sense) to make the existing toolchains work with such a setting. (In some cases it might be easier because things like emscripten wouldn't have to generate arbitrary names in order to hookup "inline" JS code (a la EM_ASM).)

In such a world, there would be no virtualization issue. That is, the virtualization issue is caused by the choice to use names rather than positional indices. One could even solve this in a backwards-compatible manner by making both available in the JS API.

The point is that non-positional imports/exports are really embedder-specific decorations that are used to facilitate integration of the wasm module into an ecosystem. How those decorations are specified really should be adapted to the (existing or expected) norms of the ecosystem.

For example, with ESM-integration, the JS values exported by other modules can be overloaded, but WebAssembly does not support overloading, so a ESM import-export decoration design should bridge this gap so the WebAssembly modules can take the place of as many JS modules as possible. That could be done by having duplicate imports and type-directed coercions to extract different overloadings of an function imported from another JS module. One could even allow duplicate exports so long as the exported types are all functions with sufficiently disjoint/distinguishable params so that they could be combined into a single overloaded JS function. And there are of course other design options each with pros and cons.

As a contrasting example, suppose the primary embedding language for WebAssembly primarily used named paramters for functions rather than positional parameters. Then the import-export decorations would look rather different, with exports of functions assigning names to every parameter. But at the low level a WebAssembly module would still be just positional imports and exports.

This is why I was suggesting that the problem be solved by taking things a step lower. "Core" WebAssembly module linking, i.e. what one would use to link together WebAssembly modules defined within the same file (e.g. for linking the Interface Types adapter module to the module it adapts), should be positional. "Embedded" WebAssembly module linking, i.e. what one would use to link together both WebAssembly and embedder-specific modules defined in separate files within a particular embedder, would be an embedder-specific section and would largely be responsible for definining the conversion between embedder-specific constructs and WebAssembly constructs, such as coercing functions, adding decorations to exported functions/instances/modules, or associating (single/double/lists-of) names with positional imports. Then "core" WebAssembly is never worrying about anything more than numbers and positions, and problems like duplicate imports become embedder-specific issues rather than "core" issues.

With that mentality, this issue could be recast as "For JS API or for ES modules, should we allow two imports to be associated with the same string?" with the answer having no significance to the module-linking proposal. It's a useful question, but it's hard to come to a good answer at the moment while the main argument is based on a coupling that (I think) shouldn't exist.

I agree that positional imports/exports avoid the whole question duplicates. The problem I'd like to avoid, though, is when you have separate compilation at separate times, such that you have to start caring about versioning, positional imports/exports have the classic fragile base class problem which has been concretely felt in component systems like COM that are based on (positional) C++ vtable layouts.

Now, if I understand what @RossTate and @titzer are proposing, this is a problem that should be solved at the host layer, allowing the host to introduce names as appropriate for that host. The problem that I see remains with that answer is that it inherently partitions the wasm tooling ecosystem by host. E.g., now I can't just have "clang targeting wasm" if clang wishes to do any form of linking (e.g., to factor out a common libc (which is an important optimization in this emerging category of highly-multi-module wasm execution platforms)) -- I need to teach clang about all the different concrete hosts it's targeting and have the host be a parameter to clang. Moreover, now the produced output can only be run on that host, preventing the ability to produce .wasm modules that are reusable across hosts. To achieve shared tooling and reusable modules, we need some type of common abstract compilation target that speaks to linking (particularly one that doesn't assume a 1:1 association between names and instances, which is insufficient for a number of use cases)) -- which is exactly what the Module Linking proposal was hoping to establish.

That being said, I can imagine a hybrid solution in which Module Linking is added as a host-agnostic spec-layer above core wasm and below the JS API. This would allow host-agnostic tools and modules to be produced (targeting this host-agnostic spec-layer) while allowing host embeddings to side-step this layer altogether, instantiating core modules directly with specialized linking schemes.

Does that sound like a reasonable way forward to resolve these conflicting goals?

Thanks, @lukewagner, for bearing with me! What you said is sounding like a more accurate understanding of where I'm coming from, and at the same time I hear your concerns about fragility and tooling/ecosystems as well.

I think a hybrid/two-layered solution is what I've been trying to suggest, where the two layers correspond to "internal" and "external" linking. Internal linking would be positional and specify instructions for building/connecting/instantiating modules—it's what every WebAssembly engine (with module linking) would have to implement. External linking would be embedder-specific and specify how to build (positional) modules from other files and the like, e.g. associating file names and/or string names to imports/exports—it's what WebAssembly embedders would have to implement. This separation of concerns would make it easier to reuse the "core" aspects of a WebAssembly program and the "core" aspects of a WebAssembly engine across multiple ecosystems. (I also think this separation has the potential to clean up aspects of internal module linking.)

But, as you mention, having two much "external" customization can lead to severe fracturing. So, while I think "internal" and "external" linking should be separated, I also think it is useful to have a "common" external linking standard. Some might develop more specialized external-linking systems in order to better fit into or take more advantage of the specifics of a particular ecosystem (using more specialized tooling to do so), but many programs' needs will fit into a simple common standard.

So, to summarize, I think there should be one "internal" linking system (that's low-level, e.g. positional), and there should be the potential for multiple "external" linking systems that sit on top of that, but with one "common" standardized such system in place. Does that make sense? (None of this speaks to the current issue of duplicate imports, unfortunately, but at least tries to characterize where within this space that issue sits, if that makes sense.)

Ok, so thinking about what this sort of change would mean concretely in the short term: reframing Module Linking as a "common linking" layer above the core spec (and, if we wanted, under the JS/Web specs) could mean that we don't need to add anything to the core spec at all. In particular, core modules' imports/exports could remain positional, such that the Module Linking layer completely ignored these names when performing instantiation. Based on that, we could close this issue with the answer "No", in which case, I'm coming around to @titzer's point above that, for symmetry, we should allow duplicate exports too so that, e.g., core modules that are exclusively instantiated by the Module Linking layer could simply supply 0-length strings for all import/export strings -- we wouldn't force core modules to include meaningless "", "a", "b", "c" ... export names just to appease validation.

(In the future, we could talk about extending core wasm with positional versions of modules/instances, but I don't think that would be additionally necessary for the initial set of use cases and requirements focused on by the Module Linking proposal.)

@rossberg @titzer @tlively Thoughts?

I am concerned that codifying that core Wasm imports and exports must be strictly positional would shut the door on https://github.com/WebAssembly/gc/issues/148, which suggests introducing a new importexport mechanism to resolve the problems of imports and exports necessarily being asymmetrical. The suggested mechanism essentially introduces weak linkage of types at the core Wasm level, and I'm not sure how it could work positionally without string identifiers.

Obviously this is hypothetical because there is no consensus to move the GC proposal in that direction, but I personally feel that it is an attractive enough option for the GC proposal that I would be sad to see it made less feasible by unrelated developments like this one right now. Am I missing a way for that proposed change to work positionally or misunderstanding this issue?

Ah, interesting connection, @tlively! It makes me think of a couple ideas.

One idea is relevant to static linking. Just like named imports help with changeability of externally linked modules, so do default imports (just like with named and default parameters for functions). Default imports make it possible to add a new configurability option to a module without breaking backwards compatibility. In a two-leveled linking approach, in the internal layer I would express this as two modules (module A has an import and module B has an export), and in the external layer the semantics would be "if given an argument for this import, link that argument to A, otherwise link B to A" so that B, which defines the default value, is only used when needed.

Another idea is relevant to dynamic linking. Here strings are used as global keys. importexport essentially says "import the value associated with this key, and if that value is not yet defined then globally register the following as the value corresponding to that key". (Again, this is all in the external-linking layer.)

Although WebAssemby/gc#148 is about nominal types, I'll note that both of the above ideas apply to exceptions as well. For static linking, I expect a common pattern for a module will be to import an exception event that by default is freshly generated but which can be instantiated to use the same event as other modules (instances?) in order to share non-local control with them. For dynamic linking, if one really wants there to not be a central run-time module (which people already know my thoughts on), then any language with exceptions will need to coordinate an exception event across dynamically linked modules and the importexport external-linking semantics is a good means to achieve that.

Neat! If I understand correctly, the importexport mechanism could be thought of as part of the module-linking layer that desugars to positional imports and exports by implicitly creating a "default" exporting module from which to import the defined type. That seems reasonable to me, so I have no further concerns about this direction.

I think the layering argument cuts both ways. You can likewise argue that it ought to be the responsibility of higher layers to map whatever (high-level-name, extra-info) pair scheme they employ for disambiguating im/exports to an unambiguous scheme of lower-level names.

For the lower level, it always seems preferable to be as explicit and unambiguous as possible. This keeps the system and interfacing with it simple -- and puts the burden of dealing with more elaborate schemes on the ecosystems that want it, instead of everybody.

As for the linking proposal as a separate layer, that sounds good in the abstract, but I'm a bit sceptical that we could make that work smoothly. For example, the proposal currently allows inner modules to refer to outer name spaces for static entities like modules or types, which seems rather central, and cannot be mapped to the core spec (except by a transformation equivalent to lambda-lifting, which would break the intended composition patterns, unless we also introduce partial application/instantiation of imports, i.e., a form of "module closures").

For example, the proposal currently allows inner modules to refer to outer name spaces for static entities like modules or types

This was one of the aspects I felt could use some simplification.

unless we also introduce partial application/instantiation of imports, i.e., a form of "module closures"

This was one of the techniques I thought could help achieve that simplification.

I generally support the idea of moving a linking specification up a layer; it's basically what I've been arguing for. Engines don't need to know names.

@rossberg I generally think that "clear and unambiguous" is great for a lower layer, but this instance is more about whether to be prescriptive or not, and lower layers should do well to not be prescriptive about things that don't concern them, IMO.

There is a related observation in https://github.com/WebAssembly/design/issues/1399#issuecomment-808401005, where the issue describes a problem with failing calls when calling Wasm exports with omitted i64 arguments, in that allowing duplicate exports and picking the one with the matching arity when calling from JS into Wasm, would be a preferable solution in context.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

beriberikix picture beriberikix  ·  7Comments

arunetm picture arunetm  ·  7Comments

thysultan picture thysultan  ·  4Comments

Artur-A picture Artur-A  ·  3Comments

spidoche picture spidoche  ·  4Comments