Design: Initial data and endiannes

Created on 3 Sep 2016  ·  6Comments  ·  Source: WebAssembly/design

Specification says nothing about which byte order i32.load uses. Real-world programs contain data section to provide virtual tables, static data, etc. These things usually require different value types. However, WebAssembly is limited to sequences of bytes in its data section. With bytes one can't emulate int32, since they don't know anything about endiannes.

It would be nice if WebAssebmly had something similar to structures and structure initializers similar to LLVM. It not only solves problem with endiannes of static data, but also gives opportunity to decrease binary data, since integers in data section which will likely consume less bytes in LEB128 representation, whereas they always consume 4 bytes when encoded in data section.

Current workarounds are:

  • don't use data section, use large initializer function instead
  • invent own format, include decoder into WebAssembly binary, run it from start function

Most helpful comment

There are significant advantages of not having to deal with multiple endiannesses across the ecosystem, and the probability of a CPU architecture that doesn't at least have efficient support for little-endian accesses becoming popular in the foreseeable future is believed to be very low.

All 6 comments

WebAssembly is little-endian. It appears AstSemantics.md doesn't mention this; I've now filed https://github.com/WebAssembly/design/pull/787 to correct this.

It is mentioned here and here and tested for here.

The idea of having LEB128-encoded data in the data section is interesting. There are a variety of ways this could be done, either with a new kind of data initializer, or with layer 1 compression. Layer 2 compression may help with large data segments in general as well.

I see, thank you. Isn't is a too strong requirement? There's no any CPU I know that uses big-endian, but if for some reason one appears and become widely used, WebAssembly would execute on this CPU with some overhead.

There are significant advantages of not having to deal with multiple endiannesses across the ecosystem, and the probability of a CPU architecture that doesn't at least have efficient support for little-endian accesses becoming popular in the foreseeable future is believed to be very low.

On Sat, Sep 3, 2016 at 4:07 PM, Alexey Andreev [email protected]
wrote:

I see, thank you. Isn't is a too strong requirement? There's no any CPU I
know that uses big-endian, but if for some reason one appears and become
widely used, WebAssembly would execute on this CPU with some overhead.

MIPS and SPARC CPUs are big-endian, though SPARC has had little-endian
loads/stores for quite some time now, and MIPS has little-end variants.

We've implemented big-endian support in V8 via explicit endianness-swapping
code but haven't measured the overheads carefully yet.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/WebAssembly/design/issues/786#issuecomment-244548450,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALnq1EEkfh88rAXYByuL9uNcjbsjVnFbks5qmX8agaJpZM4J0Snh
.

It would be nice if WebAssebmly had something similar to structures and structure initializers similar to LLVM.

This is already doable in WebAssembly exactly the same way it's done in C and C++: before main() is called the _start function can call .init_array functions. I don't think there's anything to do here.

Agreed with @sunfishcode / @titzer on endianness.

I believe this is resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mfateev picture mfateev  ·  5Comments

cretz picture cretz  ·  5Comments

JimmyVV picture JimmyVV  ·  4Comments

dpw picture dpw  ·  3Comments

bobOnGitHub picture bobOnGitHub  ·  6Comments