Libelektra: Rust Bindings

Created on 28 May 2019  ·  45Comments  ·  Source: ElektraInitiative/libelektra

Starting roughly in the middle of July, I would like to implement Rust bindings for Elektra.

I think that rust-bindgen should be able to automatically generate (some or all of) the bindings. I still expect there to be quite a bit of manual work to get them working properly. From my current understanding and from @kodebach's comment, this will result in a elektra-sys crate.
Once it is working I'll add a safe API in Rust, so that it can be used in regular Rust without the need to call unsafe code. This will then be the elektra crate.
I'll then make sure that they are correct by testing with cargo tests.

The typical way to document crates is with comments in the code. docs.rs will automatically build the documentation and make it publicly available, so I think doing documentation this way makes the most sense.

To publish the crate to crates.io, an account with an API token is needed. As discussed with @markus2330, this account should be part of ElektraInitiative such that it is accessible to future maintainers.

I'll look at the CMake integration at the start of the project, since I'm not familiar with CMake at the moment.

Is there anything else I should add?

Most helpful comment

Rust-bindgen offers two ways to generate bindings. One is via commandline, which is therefore a manual process and needs to be repeated if anything in the C-API changes. The other is via a build script, which is run every time cargo build runs. That means bindings are regenerated on each build. This is what is currently implemented. However, it requires everyone who uses the bindings to have the necessary headers that elekra needs. I imagine, if someone just installs elektra but did not compile it, he may not meet all the necessary requirements. Maybe it make more sense to regenerate the headers every once in a while, manually, since the C-API doesn't change very much any more?

Regeneration on every build seems to be the proper solution, the other bindings also work that way (they require swig to be installed). You can simply install the generated header files to avoid the troubles you are describing.

All 45 comments

I don't know that much about Rust, but I suppose that the bindings generated by rust-bindgen can only be used in unsafe Rust? If that is the case, it would be nice to have a wrapper around those with a more idiomatic Rust API.

AFAIK most Rust binding have one *-sys crate for the 1:1 mapping of the C API and another crate with the API that most users would actually use in Rust. If there is a way to tell Rust to automatically invoke keyDel, ksDel and friends when necessary that would be really nice.

If that is the case, it would be nice to have a wrapper around those with a more idiomatic Rust API.

Yes, this is the plan. And while doing so, also compare with the C API and maybe find improvements in the C API (at least in the docu).

@PhilippGackstatter As discussed: Please also find out how to upload to https://crates.io/ and how to integrate the binding in our CMake system.

@PhilippGackstatter any progress? We have someone who might also be interested in extending the Rust bindings.

@markus2330 I started a couple of days ago, but I was mostly reading up on bindgen, cmake and how to integrate it into the project, so there was nothing to show. But I have the first couple of things ready now (see #2826). I'm fully working on the project now.

To answer the questions from #2826 (please prefer asking questions in issues as PRs have the tendency to get confusing for discussions not directly related to the code):

One is, which headers in src/include I need to generate bindings for. At least kdb.h, but are there other ones that I need for the low-level api, without plugin support?

No, the low-level API is only in kdb.h

The other is, do I have to change all the docker scripts in order to install rustup (which is used to install cargo and rustc)?

Yes, you need to change the docker scripts and maybe also the Jenkinsfile. But you do not need to change all of them, if you build for recent distributions it is enough.

It would be nice if you could also make it compile with the native rust compiled bundled in Debian Buster. The Debian Buster docker file is not yet merged #2819

I suppose there is no automated way to do that.

There are some ideas to do this: #730

Rust-bindgen offers two ways to generate bindings. One is via commandline, which is therefore a manual process and needs to be repeated if anything in the C-API changes. The other is via a build script, which is run every time cargo build runs. That means bindings are regenerated on each build. This is what is currently implemented. However, it requires everyone who uses the bindings to have the necessary headers that elekra needs. I imagine, if someone just installs elektra but did not compile it, he may not meet all the necessary requirements. Maybe it make more sense to regenerate the headers every once in a while, manually, since the C-API doesn't change very much any more?

Regeneration on every build seems to be the proper solution, the other bindings also work that way (they require swig to be installed). You can simply install the generated header files to avoid the troubles you are describing.

Yes, this is the plan. And while doing so, also compare with the C API and maybe find improvements in the C API (at least in the docu).

So far, I've found some minor opportunities for improvement

  • keyGetBinary: As calling code, I can't know if a return value of -1 means maxSize is 0 or type mismatch, or something else. Since Rust uses explicit error handling in its return arguments, I would like to be able to match type mismatch to an error, and "maxSize related" errors to another. But currnetly I have to use a more generic error. I could check for a type mismatch myself, but keyGetBinary does that, so I have the same check twice.
    keySetName does something similar, matching two different errors to -1. In both cases there are errors that are bugs (invalid name) and errors that can occur in sound programs (key already in keyset), so I can sort of understand the decision. But why not use -2 for explicitness and avoidance of double checking?
  • Grammatically, shouldn't keyIsDirectBelow be keyIsDirectlyBelow 🙂? If so, should I correct this in the Rust API?

Another question: keyRel is not implemented in the CPP Bindings. Should I omit this in Rust as well?

Great work, from your questions it is obvious that you already looked deeply in the API.

I could check for a type mismatch myself, but keyGetBinary does that, so I have the same check twice.

Maybe you can even use the typesystem to avoid wrong calls? (keyGetBinary is only allowed on binary keys)

But why not use -2 for explicitness and avoidance of double checking?

The reason was compatibility: the APIs initially only returned -1 and it is not possible to add other error codes without breaking existing programs (which might have ==-1 to check for errors). But with the next release (0.9) we can break the API again. And we could avoid the problem of compatibility by stating that any values below 0 indicate errors. I fully agree that bindings should yield precise errors.

Do you want to fix these API problems?

If so, should I correct this in the Rust API?

The APIs should not differ in spelling. If we fix it, we should fix it in the C API and all bindings (actually only Java and Go needs to be adapted by hand, the others will be regenerated correctly anyway).

keyRel is not implemented in the CPP Bindings. Should I omit this in Rust as well?

Yes, as you maybe already have noticed keyIs(Direct)Below and keyRel have overlapping functionality. The idea of keyRel was to keep the API small (and thus the library small). But keyRel as-is is not usable and also slow. So we will most likely remove it within 0.9. See doc/todo/FUTURE for other candidates to be removed.

Maybe you can even use the typesystem to avoid wrong calls? (keyGetBinary is only allowed on binary keys)

That's a great idea. I could have BinaryKey and StringKey and only the first would have a get_binary() method and only the second would have a get_string() method, and so on. I'll look into this.

Do you want to fix these API problems?

I can do that. Depends on what the priority should be for me. After finishing the safe API for Rust, you said it'd be nice to also have the plugin API in Rust. You can decide what is more important.

That's a great idea. I could have BinaryKey and StringKey and only the first would have a get_binary() method and only the second would have a get_string() method, and so on. I'll look into this.

Thank you. It might require too many casts, so let us see if it is a good idea. Also another type might be useful for keys in keysets (where setName is not allowed).

I can do that. Depends on what the priority should be for me. After finishing the safe API for Rust, you said it'd be nice to also have the plugin API in Rust. You can decide what is more important.

Yes, first finish the Rust API from kdb.h and then we'll see how many more hours we have to spend.

IMO the Rust binding (and any binding for that matter) should have two versions. One that mirrors the C API as close as possible and another that is more idiomatic for the language which builds on the first version. In the idiomatic version utilising the type system with BinaryKey and StringKey (or even generics) is probably a good idea, if it makes using the API from Rust easier.

@kodebach I agree. And this seems to be also done with the elektra and elektra-sys crates.

@kodebach I agree. And this seems to be also done with the elektra and elektra-sys crates.

Yes I think so too. If the safe Rust API has limitations that need to be worked around one can import elektra_sys and call the one-to-one C binding function directly.

Thank you. It might require too many casts, so let us see if it is a good idea. Also another type might be useful for keys in keysets (where setName is not allowed).

It worked out great for the key implementation. However for KeySets, I've hit a road block. For any methods that returns a Key it has to conform to the common interface that I've created. I've implemented a method get_value that has a generic return parameter. For BinaryKeys thats bytes, for StringKeys it's strings. But what does the rust version of ksNext return now? An object that satisfies the "key interface", but with what value? I have to pick one.
This is what the signature has to look like, where Value is the type that get_value returns. I can only specify either bytes (Vec<u8>) or String.
pub fn next(&mut self) -> Box<dyn WriteableKey<Value = Vec<u8>>>;

So I could unify it to bytes, but then the user has to convert to string himself. Since the only difference of StringKey and BinaryKeys are their set_value and get_value implementation, this change would remove that explicitness and I have basically just Keys again.

I guess the real problem is that KeySet in the current implementation is not explicit about what kind of keys it contains, but the *keys are. But allowing an instance of KeySet to only contain StringKey or BinaryKey is too big of a restriction I suppose.
I think either both Key and KeySet have to be explicit about what they contain or none of them. I'm leaning towards generic Key and KeySet now, just to be consistent with the rest of elektra.
Any thoughts?

From the usability point of view it would make sense to return the Key which has a getter for a String, as this is the most often used variant. Setter for the name should be disabled (if easily possible), as we already know that this key (returned by next) is part of a KeySet.

In general, the type system should support the user, not get in its way. So keep it as simple as possible. The most common errors are:

  1. trying to change the keys name for a key that is in a keyset.
  2. trying to change meta-data keys (or other const keys).
  3. confusing duplication of Key/KeySet and references to them.
  4. iterating over KeySets that are also used to cut out keys in a way that the iteration does not work correctly anymore.
  5. forgetting to free a Key/KeySet

So if the type system can help there, it would be great. 5. will hopefully not possible by design, for 1. and 2. your abstraction should help.

The binary/string confusion is actually a rather rare error (because binary keys are very untypical: there are mostly used to hold function pointers).

Btw. if you want to write a design decision about safe uses of APIs, please go ahead (doc/decision)

From the usability point of view it would make sense to return the Key which has a getter for a String, as this is the most often used variant.

But not every byte sequence is valid UTF-8 so that wouldn't really be typesafe anymore, would it?

AFAIK the macro system in Rust is very powerful, maybe there is a way to write a function that always returns the correct type. For example, in Kotlin there is a technique for maps to encode the value type in the key. The API reference over here is an example.

Alternatively a StringKeySet that only accepts StringKeys might make sense, since binary keys are so rare and mostly not used in configurations.

From the usability point of view it would make sense to return the Key which has a getter for a String, as this is the most often used variant. Setter for the name should be disabled (if easily possible), as we already know that this key (returned by next) is part of a KeySet.

But it's still possible, although rare, to have a KeySet with mixed keys. Then always returning a StringKey and calling get_string would be an error, but the type system not only allows it but guides you towards it, since there is no get_binary method on that type.

Before doing that, I suggest making KeySet generic and instantiating it as KeySet<StringKey> if the user is sure there's only StringKeys inside (for those KeySets not coming from Rust). Then it is only natural that iterating over it would produce only StringKeys.
It would also enforce that KeySets are homogeneous via the type system, at least those created by Rust users, which would be overall safer.
In rare cases where a binary key is expected the user would have to check with is_binary and is_string and then convert, which would be a safe method call.

confusing duplication of Key/KeySet and references to them.

I think the only thing I can do is to promote usage of duplicate rather than ref count. It might be worse for performance, but keyDel is invoked automatically by Rust, while ref counting is fully manual. So duplication is certainly easier to get right than ref counting.

iterating over KeySets that are also used to cut out keys in a way that the iteration does not work correctly anymore.

Do you mean modifying the keyset while iterating over it?

Btw. if you want to write a design decision about safe uses of APIs, please go ahead (doc/decision)

What would be the content of that?

AFAIK the macro system in Rust is very powerful, maybe there is a way to write a function that always returns the correct type.

I think that the "current" design of KeySet that can contain anything and concrete Key types doesn't work together, at least not nicely. But I'll look into macros.

But not every byte sequence is valid UTF-8 so that wouldn't really be typesafe anymore, would it?

Neither Elektra's strings or binary value need to be UTF-8. Elektra only decides between string and binary (may contain 0 bytes).

AFAIK the macro system in Rust is very powerful, maybe there is a way to write a function that always returns the correct type. For example, in Kotlin there is a technique for maps to encode the value type in the key. The API reference over here is an example.

We also need to watch out to put effort in features that are useful. Binary keys are rare.

Alternatively a StringKeySet that only accepts StringKeys might make sense, since binary keys are so rare and mostly not used in configurations.

Yes, but I would see StringKeySet as the normal KeySet.

But it's still possible, although rare, to have a KeySet with mixed keys. Then always returning a StringKey and calling get_string would be an error, but the type system not only allows it but guides you towards it, since there is no get_binary method on that type.

Yes, but as said this is a minor problem. People who store binary data (like function addresses) will find out how to cast the Key (if there is some docu about it).

Before doing that, I suggest making KeySet generic and instantiating it as KeySet if the user is sure there's only StringKeys inside (for those KeySets not coming from Rust). Then it is only natural that iterating over it would produce only StringKeys.

It would be only fake safety because KeySets come from KDB (externally) and they might contain binary data in any case. Thus I prefer to have KeySet non-generic.

If you want to play around with generics, provide getters and setters which convert KeySets to (generic) data structures. E.g. an Elektra array of integers to Vec<i32>.

Do you mean modifying the keyset while iterating over it?

Yes cut modifies the KeySet. In general the iterators are safe to do so but many people have problems to get it right.

What would be the content of that?

A summary of what we discuss here and how you designed it.

I think that the "current" design of KeySet that can contain anything and concrete Key types doesn't work together, at least not nicely.

I agree.

But I'll look into macros.

Please do not put priority to it.

Neither Elektra's strings or binary value need to be UTF-8.

Then we should use OsString or CString instead of String according to a quick search.

Neither Elektra's strings or binary value need to be UTF-8.

Then we should use OsString or CString instead of String according to a quick search.

Right now I'm converting between Rust's String (which is UTF-8) to a CString before passing it on to elektra. The rationale is, that String is the default string and most other libraries expect to work with that.
I can instead make the high-level API ask for and return CStrings, such that the user would have to deal with conversion code, if they need a String. It would make for a thinner API and less error handling that needs to be dealt with. I suppose it comes down to how most users will want to use the API, which I don't have so much insight in.

I agree that it is best to return the languages' most common String type. Non-UTF8 strings should be rare (maybe even more rare than binaries).

I agree that it is best to return the languages' most common String type. Non-UTF8 strings should be rare (maybe even more rare than binaries).

I'm trying to figure out the best way to handle the Rust -> C direction, going from UTF-8 to a C-string. UTF-8 strings are allowed to contain zero bytes, but the only codepoint where that appears is the NUL character, not otherwise. I think it'd be reasonable to state this as a precondition in the binding documentation, that the strings are not allowed to contain the zero byte. If it does anyway, the code panics at that point.

The other possibility is to return an error from all the set functions that take a string. But then users have to deal with this NulError all the time and it is practically never even returned.

keyNew can return a NULL pointer on allocation error. In Rust I can either return explicit errors or panic, but not an implicit null. The rust document on signaling errors considers out of memory a catastrophic error and the stdlib aborts in this case. The java binding seems to not handle this case, so I assume it would exit the process as well, since a NullPointerException is thrown.
Do you agree that calling panic here is best (abort doesn't allow for destructors to run)?

I think it'd be reasonable to state this as a precondition in the binding documentation, that the strings are not allowed to contain the zero byte. If it does anyway, the code panics at that point.

Yes, is reasonable.

Do you agree that calling panic here is best (abort doesn't allow for destructors to run)?

Yes, it makes sense to panic if a malloc failed. (In the case of Rust as the stdlib would do the same. In C the stdlib does not abort, so C-Elektra also does not abort).

Now that the Rust bindings are merged into master, I would like to publish them to crates.io.
I suggest publishing them with the version set to (the default) 0.1.0, instead of it being tied to elektra, simply because the maturity of the bindings and elektra itself differ. Do you agree @markus2330?

Publishing on https://crates.io requires a GitHub account. Ownership of crates can be transferred between accounts, so I could use my own for now. Or someone else could login and send me the API token that is required to publish.

Thank you for publishing them to crates.io :sparkle:

I would tie the version directly to Elektra, as they are part of Elektra's repos. But it is not really important: if crates.io usually has some specific version schemata, better stick to what is common there.

Or someone else could login and send me the API token that is required to publish.

I logged in an authorized. I'll send you the API token.

I would tie the version directly to Elektra, as they are part of Elektra's repos. But it is not really important: if crates.io usually has some specific version schemata, better stick to what is common there.

The major problem with this is, according to semantic versioning, if we have to do a breaking change in the bindings we can't just upgrade the version accordingly. So we could only do breaking changes when elektra does them.

According to semantic versioning you can do any breaking change as long as the version starts with 0. If we release Elektra 1.0 it is our interest to also keep the bindings stable. And even if we fail doing so, we also have in future the option the make the versions independent (simply increase the major version of the Rust binding). So I think it is safe to simply use Elektra's version now.

You're right, I didn't think of increasing the major versions later.

Right now, the wrapper.h specifies #include "kdb.h" to include the kdb header and generate bindings for it. But clang does not find the header (in ubuntu:18.10, for instance). So I have to explicitly tell clang to include /usr/include/elektra to get it to build.
Is elektra always installed in /usr/include/elektra so that this solution should work for most distributions?

Is elektra always installed in /usr/include/elektra so that this solution should work for most distributions?

Yes, because there is another library (I think from Kerberos) that uses /usr/include/kdb.h.

For now /usr/include/elektra has to be part of the include path, but AFAIK we want to change that so that #include <elektra/kdb.h> can be used instead.

Is elektra always installed in /usr/include/elektra so that this solution should work for most distributions?

Per default it is /usr/local/include/elektra but most distributions will use /usr/include/elektra but there is no guarantee. That's why build systems usually have some support to locate header files. Elektra supports cmake and pkg-config.

Can you give some context about where you need this?

can be used instead.

must be used instead. The relevant PR is #2880

Changing to #include <elektra/kdb.h> definitely works in Ubuntu. So then I'll change to that path instead of including /usr/include/elektra.

Changing to #include <elektra/kdb.h> definitely works in Ubuntu. So then I'll change to that path instead of including /usr/include/elektra.

It probably won't work for all headers, some rely on /usr/include/elektra being in the include path.

Probably the best thing to do (if possible for you) would be using pkg-config or cmake --find-package to find the Elektra files (IMO cmake works better).

Can you give some context about where you need this?

So if a user compiles elektra on his machine, then he only needs rust/cargo and can use the bindings. But the other use case, which crates.io should be used for, is if someone installs elektra (and headers) via their package manager. Then the library and headers are available. Now that user includes elektra in their dependencies and cargo will fetch the elektra-sys crate. It only relies on the build script and clang to generate the bindings. But clang needs to find kdb.h somehow. So I can either pass additional hardcoded include paths in the build script or modify the #include ... statement directly.

Probably the best thing to do (if possible for you) would be using pkg-config or cmake --find-package to find the Elektra files (IMO cmake works better).

I can try to add pkg-config or cmake as a build-dependency and find kdb.h this way. I'll look into this. I agree this is the most reliable way.

Yes, you can try to call pkg-config in your build script. If pkg-config is not available, you can try hard-coded paths like /usr/include/elektra and /usr/local/include/elektra. (If crates.io does not require pkg-config to be available.)

You could try this crate

Yes, you can try to call pkg-config in your build script. If pkg-config is not available, you can try hard-coded paths like /usr/include/elektra and /usr/local/include/elektra. (If crates.io does not require pkg-config to be available.)

I added pkg-config as an optional dependency. If it's added, it will search for elektra and use the provided includedir. Otherwise it will search in the two directories you named.

The bindings are now published: elektra and elektra-sys :smiley:

Due to the missing system dependency of libelektra in the docs.rs build environment, the documentation didn't build. Additionally, they will change the build environment on September 30th.
I submitted a request to add libelektra as a dependency so that it will build correctly on September 30th. He also added the package to the existing environment, so the docs are now available :+1:

I think after #2980 is merged, this issue can be closed.

Very nice, they reacted super-fast. Will it be a problem that they build it with a quite old libelektra? (I did not check which version but if they include it from the package manager it will be definitely older than 0.9.)

No problems for the elektra crate, since it's only documentation. elektra-sys I think contains whatever version of elektra it was generated against instead of the current one. But I think hardly anyone would use the raw bindings instead of the wrappers. So a small trade off for having docs automatically built.

Can you then add the links to the docu within our repo?

It seems like that elektra-sys is nearly unusable anyway. It shows hundreds of symbols which have nothing/little to do with Elektra. Furthermore there is no docu to the individual functions, e.g.

https://docs.rs/elektra-sys/0.9.0/elektra_sys/fn.keyString.html

Can you then add the links to the docu within our repo?

Yes, I'll add it to the existing PR

It seems like that elektra-sys is nearly unusable anyway. It shows hundreds of symbols which have nothing/little to do with Elektra.

I will look into this.

Furthermore there is no docu to the individual functions, e.g.

It's typical for -sys crates to not have docu (for example openssl-sys), because they are one-to-one translations of the C equivalent. So one has to look up the C doc directly. I'd also have to hand copy all the docu, which adds another maintenance burden. I can link to https://doc.libelektra.org/api/current/html/index.html on the main docu page though.

It seems like that elektra-sys is nearly unusable anyway. It shows hundreds of symbols which have nothing/little to do with Elektra.

I will look into this.

It's fixed in #2980, and will be fixed on docs.rs the next time we publish the crate.

I can link to https://doc.libelektra.org/api/current/html/index.html on the main docu page though.

Yes, good idea!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sanssecours picture sanssecours  ·  4Comments

mpranj picture mpranj  ·  3Comments

markus2330 picture markus2330  ·  4Comments

markus2330 picture markus2330  ·  3Comments

markus2330 picture markus2330  ·  4Comments