Assemblyscript: .toUTF8 vs .fromUTF8 behavior is inconsistent, confusing and inefficient

Created on 23 Dec 2018  ·  3Comments  ·  Source: AssemblyScript/assemblyscript

WASM is a low-level virtual machine, so it should be able to handle strings represented as binary arrays.
There are handy methods .fromUTF8 and .toUTF8: https://github.com/AssemblyScript/assemblyscript/blob/master/std/assembly/string.ts#L499

However, they are non-symmetrical in three ways:

This is an inefficient and confusing approach. If the goal of AssemblyScript is to be a high-level WASM-friendly language, then having C-isms in the standard library like naked pointers to null-terminated strings feels like going against those goals.

My suggestion would be to rename .lengthUTF8 and .toUTF8 to .lengthUTF8ZeroTerminated, .toUTF8ZeroTerminated and introduce .toUTF8Buffer which returns an ArrayBuffer populated with the correct content and size. This API will be far more clear and convenient for users.

enhancement

Most helpful comment

Alternative solution would be to introduce a base type like

class MemSlice {
    constructor(readonly offset: usize, readonly length: usize) {}
    ...
}

which will be quite useful in general

All 3 comments

Alternative solution would be to introduce a base type like

class MemSlice {
    constructor(readonly offset: usize, readonly length: usize) {}
    ...
}

which will be quite useful in general

Agreed, these APIs aren't ideal. Might even make sense to move them out of the string class to something specifically targeting interop (with C).

UTF8/UTF16 api was improved in this PR

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dcodeIO picture dcodeIO  ·  4Comments

andy-hanson picture andy-hanson  ·  4Comments

lastmjs picture lastmjs  ·  4Comments

jarble picture jarble  ·  3Comments

solidsnail picture solidsnail  ·  5Comments