Rust: Allocator traits and std::heap

Created on 8 Apr 2016  ·  412Comments  ·  Source: rust-lang/rust

📢 This feature has a dedicated working group, please direct comments and concerns to the working group's repo.

Original Post:


FCP proposal: https://github.com/rust-lang/rust/issues/32838#issuecomment-336957415
FCP checkboxes: https://github.com/rust-lang/rust/issues/32838#issuecomment-336980230


Tracking issue for rust-lang/rfcs#1398 and the std::heap module.

  • [x] land struct Layout, trait Allocator, and default implementations in alloc crate (https://github.com/rust-lang/rust/pull/42313)
  • [x] decide where parts should live (e.g. default impls has dependency on alloc crate, but Layout/Allocator _could_ be in libcore...) (https://github.com/rust-lang/rust/pull/42313)
  • [ ] fixme from source code: audit default implementations (in Layout for overflow errors, (potentially switching to overflowing_add and overflowing_mul as necessary).
  • [x] decide if realloc_in_place should be replaced with grow_in_place and shrink_in_place (comment) (https://github.com/rust-lang/rust/pull/42313)
  • [ ] review arguments for/against associated error type (see subthread here)
  • [ ] determine what the requirements are on the alignment provided to fn dealloc. (See discussion on allocator rfc and global allocator rfc and trait Alloc PR.)

    • Is it required to deallocate with the exact align that you allocate with? Concerns have been raised that allocators like jemalloc don't require this, and it's difficult to envision an allocator that does require this. (more discussion). @ruuda and @rkruppe look like they've got the most thoughts so far on this.

  • [ ] should AllocErr be Error instead? (comment)
  • [x] Is it required to deallocate with the exact size that you allocate with? With the usable_size business we may wish to allow, for example, that you if you allocate with (size, align) you must deallocate with a size somewhere in the range of size...usable_size(size, align). It appears that jemalloc is totally ok with this (doesn't require you to deallocate with a precise size you allocate with) and this would also allow Vec to naturally take advantage of the excess capacity jemalloc gives it when it does an allocation. (although actually doing this is also somewhat orthogonal to this decision, we're just empowering Vec). So far @Gankro has most of the thoughts on this. (@alexcrichton believes this was settled in https://github.com/rust-lang/rust/pull/42313 due to the definition of "fits")
  • [ ] similar to previous question: Is it required to deallocate with the exact alignment that you allocated with? (See comment from 5 June 2017)
  • [x] OSX/alloc_system is buggy on huge alignments (e.g. an align of 1 << 32) https://github.com/rust-lang/rust/issues/30170 #43217
  • [ ] should Layout provide a fn stride(&self) method? (See also https://github.com/rust-lang/rfcs/issues/1397, https://github.com/rust-lang/rust/issues/17027 )
  • [x] Allocator::owns as a method? https://github.com/rust-lang/rust/issues/44302

State of std::heap after https://github.com/rust-lang/rust/pull/42313:

pub struct Layout { /* ... */ }

impl Layout {
    pub fn new<T>() -> Self;
    pub fn for_value<T: ?Sized>(t: &T) -> Self;
    pub fn array<T>(n: usize) -> Option<Self>;
    pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
    pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;

    pub fn size(&self) -> usize;
    pub fn align(&self) -> usize;
    pub fn align_to(&self, align: usize) -> Self;
    pub fn padding_needed_for(&self, align: usize) -> usize;
    pub fn repeat(&self, n: usize) -> Option<(Self, usize)>;
    pub fn extend(&self, next: Self) -> Option<(Self, usize)>;
    pub fn repeat_packed(&self, n: usize) -> Option<Self>;
    pub fn extend_packed(&self, next: Self) -> Option<(Self, usize)>;
}

pub enum AllocErr {
    Exhausted { request: Layout },
    Unsupported { details: &'static str },
}

impl AllocErr {
    pub fn invalid_input(details: &'static str) -> Self;
    pub fn is_memory_exhausted(&self) -> bool;
    pub fn is_request_unsupported(&self) -> bool;
    pub fn description(&self) -> &str;
}

pub struct CannotReallocInPlace;

pub struct Excess(pub *mut u8, pub usize);

pub unsafe trait Alloc {
    // required
    unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);

    // provided
    fn oom(&mut self, _: AllocErr) -> !;
    fn usable_size(&self, layout: &Layout) -> (usize, usize);
    unsafe fn realloc(&mut self,
                      ptr: *mut u8,
                      layout: Layout,
                      new_layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn alloc_zeroed(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn alloc_excess(&mut self, layout: Layout) -> Result<Excess, AllocErr>;
    unsafe fn realloc_excess(&mut self,
                             ptr: *mut u8,
                             layout: Layout,
                             new_layout: Layout) -> Result<Excess, AllocErr>;
    unsafe fn grow_in_place(&mut self,
                            ptr: *mut u8,
                            layout: Layout,
                            new_layout: Layout) -> Result<(), CannotReallocInPlace>;
    unsafe fn shrink_in_place(&mut self,
                              ptr: *mut u8,
                              layout: Layout,
                              new_layout: Layout) -> Result<(), CannotReallocInPlace>;

    // convenience
    fn alloc_one<T>(&mut self) -> Result<Unique<T>, AllocErr>
        where Self: Sized;
    unsafe fn dealloc_one<T>(&mut self, ptr: Unique<T>)
        where Self: Sized;
    fn alloc_array<T>(&mut self, n: usize) -> Result<Unique<T>, AllocErr>
        where Self: Sized;
    unsafe fn realloc_array<T>(&mut self,
                               ptr: Unique<T>,
                               n_old: usize,
                               n_new: usize) -> Result<Unique<T>, AllocErr>
        where Self: Sized;
    unsafe fn dealloc_array<T>(&mut self, ptr: Unique<T>, n: usize) -> Result<(), AllocErr>
        where Self: Sized;
}

/// The global default allocator
pub struct Heap;

impl Alloc for Heap {
    // ...
}

impl<'a> Alloc for &'a Heap {
    // ...
}

/// The "system" allocator
pub struct System;

impl Alloc for System {
    // ...
}

impl<'a> Alloc for &'a System {
    // ...
}
B-RFC-approved B-unstable C-tracking-issue Libs-Tracked T-lang T-libs disposition-merge finished-final-comment-period

Most helpful comment

@alexcrichton The decision to switch from -> Result<*mut u8, AllocErr> to -> *mut void may come as a significant surprise to people who followed the original development of the allocator RFC's.

I don't disagree with the points you make, but it nonetheless seemed like a fair number of people would have been willing to live with the "heavy-weightness" of Result over the increased likelihood of missing a null-check on the returned value.

  • I am ignoring the runtime efficiency issues imposed by the ABI because I, like @alexcrichton, assume that we could deal with those in some way via compiler tricks.

Is there some way we could get increased visibility on that late change on its own?

One way (off the top of my head): Change the signature now, in a PR on its own, on the master branch, while Allocator is still unstable. And then see who complains on the PR (and who celebrates!).

  • Is this too heavy-handed? Seems like it is by definitions less heavy-handed than coupling such a change with stabilization...

All 412 comments

I unfortunately wasn't paying close enough attention to mention this in the RFC discussion, but I think that realloc_in_place should be replaced by two functions, grow_in_place and shrink_in_place, for two reasons:

  • I can't think of a single use case (short of implementing realloc or realloc_in_place) where it is unknown whether the size of the allocation is increasing or decreasing. Using more specialized methods makes it slightly more clear what is going on.
  • The code paths for growing and shrinking allocations tend to be radically different - growing involves testing whether adjacent blocks of memory are free and claiming them, while shrinking involves carving off properly sized subblocks and freeing them. While the cost of a branch inside realloc_in_place is quite small, using grow and shrink better captures the distinct tasks that an allocator needs to perform.

Note that these can be added backwards-compatibly next to realloc_in_place, but this would constrain which functions would be by default implemented in terms of which others.

For consistency, realloc would probably also want to be split into grow and split, but the only advantage to having an overloadable realloc function that I know of is to be able to use mmap's remap option, which does not have such a distinction.

Additionally, I think that the default implementations of realloc and realloc_in_place should be slightly adjusted - instead of checking against the usable_size, realloc should just first try to realloc_in_place. In turn, realloc_in_place should by default check against the usable size and return success in the case of a small change instead of universally returning failure.

This makes it easier to produce a high-performance implementation of realloc: all that is required is improving realloc_in_place. However, the default performance of realloc does not suffer, as the check against the usable_size is still performed.

Another issue: The doc for fn realloc_in_place says that if it returns Ok, then one is assured that ptr now "fits" new_layout.

To me this implies that it must check that the alignment of the given address matches any constraint implied by new_layout.

However, I don't think the spec for the underlying fn reallocate_inplace function implies that _it_ will perform any such check.

  • Furthermore, it seems reasonable that any client diving into using fn realloc_in_place will themselves be ensuring that the alignments work (in practice I suspect it means that the same alignment is required everywhere for the given use case...)

So, should the implementation of fn realloc_in_place really be burdened with checking that the alignment of the given ptr is compatible with that of new_layout? It is probably better _in this case_ (of this one method) to push that requirement back to the caller...

@gereeter you make good points; I will add them to the check list I am accumulating in the issue description.

(at this point I am waiting for #[may_dangle] support to ride the train into the beta channel so that I will then be able to use it for std collections as part of allocator integration)

I'm new to Rust, so forgive me if this has been discussed elsewhere.

Is there any thought on how to support object-specific allocators? Some allocators such as slab allocators and magazine allocators are bound to a particular type, and do the work of constructing new objects, caching constructed objects which have been "freed" (rather than actually dropping them), returning already-constructed cached objects, and dropping objects before freeing the underlying memory to an underlying allocator when required.

Currently, this proposal doesn't include anything along the lines of ObjectAllocator<T>, but it would be very helpful. In particular, I'm working on an implementation of a magazine allocator object-caching layer (link above), and while I can have this only wrap an Allocator and do the work of constructing and dropping objects in the caching layer itself, it'd be great if I could also have this wrap other object allocators (like a slab allocator) and truly be a generic caching layer.

Where would an object allocator type or trait fit into this proposal? Would it be left for a future RFC? Something else?

I don't think this has been discussed yet.

You could write your own ObjectAllocator<T>, and then do impl<T: Allocator, U> ObjectAllocator<U> for T { .. }, so that every regular allocator can serve as an object-specific allocator for all objects.

Future work would be modifying collections to use your trait for their nodes, instead of plain ole' (generic) allocators directly.

@pnkfelix

(at this point I am waiting for #[may_dangle] support to ride the train into the beta channel so that I will then be able to use it for std collections as part of allocator integration)

I guess this has happened?

@Ericson2314 Yeah, writing my own is definitely an option for experimental purposes, but I think there'd be much more benefit to it being standardized in terms of interoperability (for example, I plan on also implementing a slab allocator, but it would be nice if a third-party user of my code could use somebody _else's_ slab allocator with my magazine caching layer). My question is simply whether an ObjectAllocator<T> trait or something like it is worth discussing. Although it seems that it might be best for a different RFC? I'm not terribly familiar with the guidelines for how much belongs in a single RFC and when things belong in separate RFCs...

@joshlf

Where would an object allocator type or trait fit into this proposal? Would it be left for a future RFC? Something else?

Yes, it would be another RFC.

I'm not terribly familiar with the guidelines for how much belongs in a single RFC and when things belong in separate RFCs...

that depends on the scope of the RFC itself, which is decided by the person who writes it, and then feedback is given by everyone.

But really, as this is a tracking issue for this already-accepted RFC, thinking about extensions and design changes isn't really for this thread; you should open a new one over on the RFCs repo.

@joshlf Ah, I thought ObjectAllocator<T> was supposed to be a trait. I meant prototype the trait not a specific allocator. Yes that trait would merit its own RFC as @steveklabnik says.


@steveklabnik yeah now discussion would be better elsewhere. But @joshlf was also raising the issue lest it expose a hitherto unforeseen flaw in the accepted but unimplemented API design. In that sense it matches the earlier posts in this thread.

@Ericson2314 Yeah, I thought that was what you meant. I think we're on the same page :)

@steveklabnik Sounds good; I'll poke around with my own implementation and submit an RFC if it ends up seeming like a good idea.

@joshlf I don't any reason why custom allocators would go into the compiler or standard library. Once this RFC lands, you could easily publish your own crate that does an arbitrary sort of allocation (even a fully-fledged allocator like jemalloc could be custom-implemented!).

@alexreg This isn't about a particular custom allocator, but rather a trait that specifies the type of all allocators which are parametric on a particular type. So just like RFC 1398 defines a trait (Allocator) that is the type of any low-level allocator, I'm asking about a trait (ObjectAllocator<T>) that is the type of any allocator which can allocate/deallocate and construct/drop objects of type T.

@alexreg See my early point about using standard library collections with custom object-specific allocators.

Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.

On 4 Jan 2017, at 21:59, Joshua Liebow-Feeser notifications@github.com wrote:

@alexreg https://github.com/alexreg This isn't about a particular custom allocator, but rather a trait that specifies the type of all allocators which are parametric on a particular type. So just like RFC 1398 defines a trait (Allocator) that is the type of any low-level allocator, I'm asking about a trait (ObjectAllocator) that is the type of any allocator which can allocate/deallocate and construct/drop objects of type T.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270499064, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3IhyyPhFgu1EGHr_GM_Evsr0SRzIks5rPBZGgaJpZM4IDYUN.

I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.

On 4 Jan 2017, at 22:01, John Ericson notifications@github.com wrote:

@alexreg https://github.com/alexreg See my early point about using standard library collections with custom object-specific allocators.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270499628, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3CrjYIXqcv8Aqvb4VTyPcajJozICks5rPBbOgaJpZM4IDYUN.

Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.

Yes but you probably want some standard library functionality to rely on it (such as what @Ericson2314 suggested).

I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.

Ideally you'd want both - to accept either type of allocator. There are very significant benefits to using object-specific caching; for example, both slab allocation and magazine caching give very significant performance benefits - take a look at the papers I linked to above if you're curious.

But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.

On 4 Jan 2017, at 22:13, Joshua Liebow-Feeser notifications@github.com wrote:

Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.

Yes but you probably want some standard library functionality to rely on it (such as what @Ericson2314 https://github.com/Ericson2314 suggested).

I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.

Ideally you'd want both - to accept either type of allocator. There are very significant benefits to using object-specific caching; for example, both slab allocation and magazine caching give very significant performance benefits - take a look at the papers I linked to above if you're curious.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270502231, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3L9F9r_0T5evOtt7Es92vw6gBxR9ks5rPBl9gaJpZM4IDYUN.

But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.

Ah, so the problem is that the semantics are different. Allocator allocates and frees raw byte blobs. ObjectAllocator<T>, on the other hand, would allocate already-constructed objects and would also be responsible for dropping these objects (including being able to cache constructed objects which could be handed out later in leu of constructing a newly-allocated object, which is expensive). The trait would look something like this:

trait ObjectAllocator<T> {
    fn alloc() -> T;
    fn free(t T);
}

This is not compatible with Allocator, whose methods deal with raw pointers and have no notion of type. Additionally, with Allocators, it is the caller's responsibility to drop the object being freed first. This is really important - knowing about the type T allows ObjectAllocator<T> to do things like call T's drop method, and since free(t) moves t into free, the caller _cannot_ drop t first - it is instead the ObjectAllocator<T>'s responsibility. Fundamentally, these two traits are incompatible with one another.

Ah right, I see. I thought this proposal already included something like that, i.e. a “higher-level” allocator over the byte level. In that case, a perfectly fair proposal!

On 4 Jan 2017, at 22:29, Joshua Liebow-Feeser notifications@github.com wrote:

But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.

Ah, so the problem is that the semantics are different. Allocator allocates and frees raw byte blobs. ObjectAllocator, on the other hand, would allocate already-constructed objects and would also be responsible for dropping these objects (including being able to cache constructed objects which could be handed out later in leu of constructing a newly-allocated object, which is expensive). The trait would look something like this:

trait ObjectAllocator {
fn alloc() -> T;
fn free(t T);
}
This is not compatible with Allocator, whose methods deal with raw pointers and have no notion of type. Additionally, with Allocators, it is the caller's responsibility to drop the object being freed first. This is really important - knowing about the type T allows ObjectAllocator to do things like call T's drop method, and since free(t) moves t into free, the caller cannot drop t first - it is instead the ObjectAllocator's responsibility. Fundamentally, these two traits are incompatible with one another.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270505704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3GViJBefuk8IWgPauPyL5tV78Fn5ks5rPB08gaJpZM4IDYUN.

@alexreg Ah yes, I was hoping so too :) Oh well - it'll have to wait for another RFC.

Yes, do kick start that RFC, I’m sure it would get plenty of support! And thanks for the clarification (I hadn’t kept up with the details of this RFC at all).

On 5 Jan 2017, at 00:53, Joshua Liebow-Feeser notifications@github.com wrote:

@alexreg https://github.com/alexreg Ah yes, I was hoping so too :) Oh well - it'll have to wait for another RFC.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32838#issuecomment-270531535, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEF3MQQeXhTliU5CBsoheBFL26Ee9WUks5rPD8RgaJpZM4IDYUN.

A crate for testing custom allocators would be useful.

Forgive me if I'm missing something obvious, but is there a reason for the Layout trait described in this RFC not to implement Copy as well as Clone, since it's just POD?

I can't think of any.

Sorry to be bringing this up so late in the process, but...

Might it be worth adding support for a dealloc-like function that isn't a method, but rather a function? The idea would be to use alignment to be able to infer from a pointer where in memory its parent allocator is and thus be able to free without needing an explicit allocator reference.

This could be a big win for data structures that use custom allocators. This would allow them to not keep a reference to the allocator itself, but rather only need to be parametric on the _type_ of the allocator to be able to call the right dealloc function. For example, if Box is eventually modified to support custom allocators, then it'd be able to keep being only a single word (just the pointer) as opposed to having to be expanded to two words to store a reference to the allocator as well.

On a related note, it might also be useful to support a non-method alloc function to allow for global allocators. This would compose nicely with a non-method dealloc function - for global allocators, there'd be no need to do any kind of pointer-to-allocator inference since there'd only be a single static instance of the allocator for the whole program.

@joshlf The current design allows you to get that by just having your allocator be a (zero-sized) unit type - i.e. struct MyAlloc; that you then implement the Allocator trait on.
Storing references or nothing at all, always, is less general than storing the allocator by-value.

I could see that being true for a directly-embedded type, but what about if a data structure decies to keep a reference instead? Does a reference to a zero-sized type take up zero space? That is, if I have:

struct Foo()

struct Blah{
    foo: &Foo,
}

Does Blah have zero size?

Actually, even it's possible, you might not want your allocator to have zero size. For example, you might have an allocator with a non-zero size that you allocate _from_, but that has the ability to free objects w/o knowing about the original allocator. This would still be useful for making a Box take only a word. You'd have something like Box::new_from_allocator which would have to take an allocator as an argument - and it might be a nonzero-sized allocator - but if the allocator supported freeing without the original allocator reference, the returned Box<T> could avoid storing a reference to the allocator that was passed in the original Box::new_from_allocator call.

For example, you might have an allocator with a non-zero size that you allocate from, but that has the ability to free objects w/o knowing about the original allocator.

I recall long, long, ago proposing factoring out separate allocator and deallocator traits (with associate types connecting the two) for basically this reason.

Is/should the compiler be allowed to optimize away allocations with these allocators?

Is/should the compiler be allowed to optimize away allocations with these allocators?

@Zoxc What do you mean?

I recall long, long, ago proposing factoring out separate allocator and deallocator traits (with associate types connecting the two) for basically this reason.

For posterity, let me clarify this statement (I talked to @Ericson2314 about it offline): The idea is that a Box could be parametric just on a deallocator. So you could have the following implementation:

trait Allocator {
    type D: Deallocator;

    fn get_deallocator(&self) -> Self::D;
}

trait Deallocator {}

struct Box<T, D: Deallocator> {
    ptr: *mut T,
    d: D,
}

impl<T, D: Deallocator> Box<T, D> {
    fn new_from_allocator<A: Allocator>(x: T, a: A) -> Box<T, A::D> {
        ...
        Box {
            ptr: ptr,
            d: a.get_deallocator()
        }
    }
}

This way, when calling new_from_allocator, if A::D is a zero-sized type, then the d field of Box<T, A::D> takes up zero size, and so the size of the resulting Box<T, A::D> is a single word.

Is there a timeline for when this will land? I'm working on some allocator stuff, and it'd be nice if this stuff were there for me to build off of.

If there's interest, I'd be happy to lend some cycles to this, but I'm relatively new to Rust, so that might just create more work for the maintainers in terms of having to code review a newbie's code. I don't want to step on anybody's toes, and I don't want to make more work for people.

Ok we've recently met to evaluate the state of allocators and I think there's some good news for this as well! It looks like support has not yet landed in libstd for these APIs, but everyone's still happy with them landing at any time!

One thing we discussed is that changing over all the libstd types may be a bit premature due to possible inference issues, but regardless of that it seems like a good idea to land the Allocator trait and the Layout type in the proposed std::heap module for experimentation elsewhere in the ecosystem!

@joshlf if you'd like to help out here I think that'd be more than welcome! The first piece will likely be landing the basic type/trait from this RFC into the standard library, and then from there we can start experimenting and playing around with collections in libstd as well.

@alexcrichton I think your link is broken? It points back here.

One thing we discussed is that changing over all the libstd types may be a bit premature due to possible inference issues

Adding the trait is a good first step, but without refactoring existing APIs to use it they won't see much usage. In https://github.com/rust-lang/rust/issues/27336#issuecomment-300721558 I propose we can refactor the crates behind the facade immediately, but add newtype wrappers in std. Annoying to do, but allows us to make progress.

@alexcrichton What would be the process for getting object allocators in? My experiments so far (soon to be public; I can add you to the private GH repo if you're curious) and the discussion here have led me to believe that there's going to be a near-perfect symmetry between the allocator traits and object allocator traits. E.g., you'll have something like (I changed Address to *mut u8 for symmetry with *mut T from ObjectAllocator<T>; we'd probably end up with Address<T> or something like that):

unsafe trait Allocator {
    unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
    unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
}
unsafe trait ObjectAllocator<T> {
    unsafe fn alloc(&mut self) -> Result<*mut T, AllocErr>;
    unsafe fn dealloc(&mut self, ptr: *mut T);
}

Thus, I think experimenting with both allocators and object allocators at the same time could be useful. I'm not sure if this is the right place for that, though, or whether there should be another RFC or, at the very least, a separate PR.

Oh I meant to link here which also has information about the global allocator. @joshlf is that what you're thinking?

It sounds like @alexcrichton wants a PR that provides the Allocator trait and Layout type, even if its not integrated into any collection in libstd.

If I understand that correctly, then I can put up a PR for that. I had not done so because I keep trying to get at least integration with RawVec and Vec prototyped. (At this point I have RawVec done, but Vec is a bit more challenging due to the many other structures that build off of it, like Drain and IntoIter etc...)

actually, my current branch seems like it might actually build (and the one test for integration withRawVec passed), so I went ahead and posted it: #42313

@hawkw asked:

Forgive me if I'm missing something obvious, but is there a reason for the Layout trait described in this RFC not to implement Copy as well as Clone, since it's just POD?

The reason I made Layout only implement Clone and not Copy is that I wanted to leave open the possibility of adding more structure to the Layout type. In particular, I still am interested in trying to have the Layout attempt to track any type structure used to construct it (e.g. 16-array of struct { x: u8, y: [char; 215] }), so that allocators would have the option of exposing instrumentation routines that report on what types their current contents are composes from.

This would almost certainly have to be an optional feature, i.e. it seems like the tide is firmly against forcing developers to to use the type-enriched Layout constructors. so any instrumentation of this form would need to include something like an "unknown memory blocks" category to handle allocations that do not have the type information.

But nonetheless, features like this were the main reason why I did not opt to make Layout implement Copy; I basically figured that an implementation of Copy would be a premature constraint on Layout itself.

@alexcrichton @pnkfelix

Looks like @pnkfelix has this one covered, and that PR is getting traction, so let's just go with that. I'm looking over it and making comments now, and it looks great!

Currently the signature for Allocator::oom is:

    fn oom(&mut self, _: AllocErr) -> ! {
        unsafe { ::core::intrinsics::abort() }
    }

It was brought to my attention, though, that Gecko at least likes to know the allocation size as well on OOM. We may wish to consider this when stabilizing to perhaps add context like Option<Layout> for why OOM is happening.

@alexcrichton
Might it be worth having either multiple oom_xxx variants or an enum of different argument types? There are a couple of different signatures for methods that could fail (e.g., alloc takes a layout, realloc takes a pointer, an original layout, and a new layout, etc), and there might be cases in which an oom-like method would want to know about all of them.

@joshlf that's true, yes, but I'm not sure if it's useful. I wouldn't want to just add features because we can, they should continue to be well motivated.


A point for stabilization here is also "determine what the requirements are on the alignment provided to fn dealloc", and the current implementation of dealloc on Windows uses align to determine how to correctly free. @ruuda you may be interested in this fact.

A point for stabilization here is also "determine what the requirements are on the alignment provided to fn dealloc", and the current implementation of dealloc on Windows uses align to determine how to correctly free.

Yes, I think this is how I initially ran into this; my program crashed on Windows because of this. As HeapAlloc makes no alignment guarantees, allocate allocates a bigger region and stores the original pointer in a header, but as an optimization this is avoided if the alignment requirements would be satisfied anyway. I wonder if there is a way to convert HeapAlloc into an alignment-aware allocator that does not require alignment on free, without losing this optimization.

@ruuda

As HeapAlloc makes no alignment guarantees

It does provide a minimum alignment guarantee of 8 bytes for 32bit or 16 bytes for 64bit, it just doesn't provide any way to guarantee alignment higher than that.

The _aligned_malloc provided by the CRT on Windows can provide allocations of higher alignment, but notably it must be paired with _aligned_free, using free is illegal. So if you don't know whether an allocation was done via malloc or _aligned_malloc then you're stuck in the same conundrum that alloc_system is in on Windows if you don't know the alignment for deallocate. The CRT does not provide the standard aligned_alloc function which can be paired with free, so even Microsoft hasn't been able to solve this problem. (Although it is a C11 function and Microsoft doesn't support C11 so that's a weak argument.)

Do note that deallocate only cares about the alignment to know whether it is overaligned, the actual value itself is irrelevant. If you wanted a deallocate that was truly alignment independent, you could simply treat all allocations as overaligned, but you'd waste a lot of memory on small allocations.

@alexcrichton wrote:

Currently the signature for Allocator::oom is:

    fn oom(&mut self, _: AllocErr) -> ! {
        unsafe { ::core::intrinsics::abort() }
    }

It was brought to my attention, though, that Gecko at least likes to know the allocation size as well on OOM. We may wish to consider this when stabilizing to perhaps add context like Option<Layout> for why OOM is happening.

The AllocErr already carries the Layout in the AllocErr::Exhausted variant. We could just add the Layout to the AllocErr::Unsupported variant as well, which I think would be simplest in terms of client expectations. (It does have the drawback of silghtly increasing the side of the AllocErr enum itself, but maybe we shouldn't worry about that...)

Oh I suspect that's all that's needed, thanks for the correction @pnkfelix!

I'm going to start repurposing this issue for the tracking issue for std::heap in general as it will be after https://github.com/rust-lang/rust/pull/42727 lands. I'll be closing a few other related issues in favor of this.

Is there a tracking issue for converting collections? Now that the PRs are merged, I'd like to

  • Discuss the associated error type
  • Discuss converting collections to use any local allocator, (especially leveraging associated error type)

I've opened https://github.com/rust-lang/rust/issues/42774 to track integration of Alloc into std collections. With historical discussion in the libs team that's likely to be on a separate track of stabilization than an initial pass of the std::heap module.

While reviewing allocator-related issues I also came across https://github.com/rust-lang/rust/issues/30170 which @pnkfelix awhile back. It looks like the OSX system allocator is buggy with high alignments and when running that program with jemalloc it's segfaulting during deallocation on Linux at least. Worth considering during stabilization!

I've opened #42794 as a place to discuss the specific question of whether zero-sized allocations need to match their requested alignment.

(oh wait, zero-sized allocations are illegal in user allocators!)

Since the alloc::heap::allocate function and friends are now gone in Nightly, I’ve updated Servo to use this new API. This is part of the diff:

-use alloc::heap;
+use alloc::allocator::{Alloc, Layout};
+use alloc::heap::Heap;
-        let ptr = heap::allocate(req_size as usize, FT_ALIGNMENT) as *mut c_void;
+        let layout = Layout::from_size_align(req_size as usize, FT_ALIGNMENT).unwrap();
+        let ptr = Heap.alloc(layout).unwrap() as *mut c_void;

I feel the ergonomics are not great. We went from importing one item to importing three from two different modules.

  • Would it make sense to have a convenience method for allocator.alloc(Layout::from_size_align(…))?
  • Would it make sense to make <Heap as Alloc>::_ methods available as free functions or inherent methods? (To have one fewer item to import, the Alloc trait.)

Alternatively, could the Alloc trait be in the prelude or is it too niche of a use case?

@SimonSapin IMO there isn't much of a point in optimizing the ergonomics of such a low-level API.

@SimonSapin

I feel the ergonomics are not great. We went from importing one item to importing three from two different modules.

I had the exact same feeling with my codebase - it's pretty clunky now.

Would it make sense to have a convenience method for allocator.alloc(Layout::from_size_align(…))?

Do you mean in the Alloc trait, or just for Heap? One thing to consider here is that there's now a third error condition: Layout::from_size_align returns an Option, so it could return None in addition to the normal errors you can get when allocating.

Alternatively, could the Alloc trait be in the prelude or is it too niche of a use case?

IMO there isn't much of a point in optimizing the ergonomics of such a low-level API.

I agree that it's probably too low-level to put in the prelude, but I still think that there's value in optimizing the ergonomics (selfishly, at least - that was a really annoying refactor 😝 ).

@SimonSapin did you not handle OOM before? Also in std all three types are available in the std::heap module (they're supposed to be in one module). Also did you not handle the case of overflowing sizes before? Or zero-sized types?

did you not handle OOM before?

When it existed the alloc::heap::allocate function returned a pointer without a Result and did not leave a choice in OOM handling. I think it aborted the process. Now I’ve added .unwrap() to panic the thread.

they're supposed to be in one module

I see now that heap.rs contains pub use allocator::*;. But when I clicked Alloc in the impl listed on the rustdoc page for Heap I was sent to alloc::allocator::Alloc.

As to the rest, I haven’t looked into it. I’m porting to a new compiler a big pile of code that was written years ago. I think these are callbacks for FreeType, a C library.

When it existed the alloc::heap::allocate function returned a pointer without a Result and did not leave a choice in OOM handling.

It did give you a choice. The pointer it returned could have been a null pointer which would indicate the heap allocator failed to allocate. This is why I'm so glad it switched to Result so people don't forget to handle that case.

Oh well, maybe the FreeType ended up doing a null check, I don’t know. Anyway, yes, returning a Result is good.

Given #30170 and #43097, I am tempted to resolve the OS X issue with ridiculously big alignments by simply specifying that users cannot request alignments >= 1 << 32.

One very easy way to enforce this: Change the Layout interface so that align is denoted by a u32 instead of a usize.

@alexcrichton do you have thoughts on this? Should I just make a PR that does this?

@pnkfelix Layout::from_size_align would still take usize and return an error on u32 overflow, right?

@SimonSapin what reason is there to have it continue taking usize align, if a static precondition is that it is unsafe to pass a value >= 1 << 32 ?

and if the answer is "well some allocators might support an alignment >= 1 << 32", then we're back to the status quo and you can disregard my suggestion. The point of my suggestion is basically a "+1" to comments like this one

Because std::mem::align_of returns usize

@SimonSapin ah, good old stable API's... sigh.

@pnkfelix limiting to 1 << 32 seems reasonable to me!

@rfcbot fcp merge

Ok this trait and its types have bake for awhile now and also been the underlying implementation of the standard collections since its inception. I would propose starting out with a particularly conservative initial offering, namely only stabilizing the following interface:

pub struct Layout { /* ... */ }

extern {
    pub type void;
}

impl Layout {
    pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
    pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;

    pub fn size(&self) -> usize;
    pub fn align(&self) -> usize;
}

pub unsafe trait Alloc {
    unsafe fn alloc(&mut self, layout: Layout) -> *mut void;
    unsafe fn alloc_zeroed(&mut self, layout: Layout) -> *mut void;
    unsafe fn dealloc(&mut self, ptr: *mut void, layout: Layout);

    // all other methods are default and unstable
}

/// The global default allocator
pub struct Heap;

impl Alloc for Heap {
    // ...
}

impl<'a> Alloc for &'a Heap {
    // ...
}

/// The "system" allocator
pub struct System;

impl Alloc for System {
    // ...
}

impl<'a> Alloc for &'a System {
    // ...
}

Original proposal

pub struct Layout { /* ... */ }

impl Layout {
    pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
    pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;

    pub fn size(&self) -> usize;
    pub fn align(&self) -> usize;
}

// renamed from AllocErr today
pub struct Error {
    // ...
}

impl Error {
    pub fn oom() -> Self;
}

pub unsafe trait Alloc {
    unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, Error>;
    unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);

    // all other methods are default and unstable
}

/// The global default allocator
pub struct Heap;

impl Alloc for Heap {
    // ...
}

impl<'a> Alloc for &'a Heap {
    // ...
}

/// The "system" allocator
pub struct System;

impl Alloc for System {
    // ...
}

impl<'a> Alloc for &'a System {
    // ...
}

Notably:

  • Only stabilizing alloc, alloc_zeroed, and dealloc methods on the Alloc trait for now. I think this solves the most pressing issue we have today, defining a custom global allocator.
  • Remove the Error type in favor of just using raw pointers.
  • Change the u8 type in the interface to void
  • A stripped down version of the Layout type.

There are still open questions such as what to do with dealloc and alignment (precise alignment? fits? unsure?), but I'm hoping we can resolve them during FCP as it likely won't be an API-breaking change.

+1 to getting something stabilized!

Renames AllocErr to Error and moves the interface to be a bit more conservative.

Does this eliminate the option for allocators to specify Unsupported? At the risk of harping more on something I've been harping on a lot, I think that #44557 is still an issue.

Layout

It looks like you've removed some of the methods from Layout. Did you mean to have the ones you left out actually removed, or just left as unstable?

impl Error {
    pub fn oom() -> Self;
}

Is this a constructor for what’s today AllocErr::Exhausted? If so, shouldn’t it have a Layout parameter?

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

  • [x] @BurntSushi
  • [x] @Kimundi
  • [x] @alexcrichton
  • [x] @aturon
  • [x] @cramertj
  • [x] @dtolnay
  • [x] @eddyb
  • [x] @nikomatsakis
  • [x] @nrc
  • [x] @pnkfelix
  • [x] @sfackler
  • [x] @withoutboats

Concerns:

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

I'm really excited about getting to stabilize some of this work!

One question: in the above thread, @joshlf and @Ericson2314 raised an interesting point about the possibility of separating the Alloc and Dealloc traits in order to optimize for cases in which alloc requires some data, but dealloc requires no extra information, so the Dealloc type can be zero-sized.

Was this question ever resolved? What are the disadvantages of separating the two traits?

@joshlf

Does this eliminate the option for allocators to specify Unsupported?

Yes and no, it would mean that such an operation is not supported in stable rust immediately, but we could continue to support it in unstable Rust.

Did you mean to have the ones you left out actually removed, or just left as unstable?

Indeed! Again though I'm just proposing a stable API surface area, we can leave all the other methods as unstable. Over time we can continue to stabilize more of the functionality. I think it's best to start as conservative as we can.


@SimonSapin

Is this a constructor for what’s today AllocErr::Exhausted? If so, shouldn’t it have a Layout parameter?

Aha good point! I sort of wanted to leave the possibility though to make Error a zero-sized type if we really needed it, but we can of course keep the layout-taking methods in unstable and stabilize them if necessary. Or do you think that the layout-preserving Error should be stabilized in the first pass?


@cramertj

I hadn't personally seen such a question/concern yet (I think I missed it!), but I wouldn't personally see it as being worth it. Two traits is twice the boilerplate in general, as now everyone would have to type Alloc + Dealloc in collections for example. I would expect that such a specialized use would not want to inform the interface all other users end up using, personally.

@cramertj @alexcrichton

I hadn't personally seen such a question/concern yet (I think I missed it!), but I wouldn't personally see it as being worth it.

In general I agree that it's not worth it with one glaring exception: Box. Box<T, A: Alloc> would, given the current definition of Alloc, have to be at least two words large (the pointer that it already has and a reference to an Alloc at the very least) except in the case of global singletons (which can be implemented as ZSTs). A 2x (or more) blowup in the space required to store such a common and fundamental type is concerning to me.

@alexcrichton

as now everyone would have to type Alloc + Dealloc in collections for example

We could add something like this:

trait Allocator: Alloc + Dealloc {}
impl<T> Allocator for T where T: Alloc + Dealloc {}

A 2x (or more) blowup in the space required to store such a common and fundamental type

Only when you use a custom allocator that is not process-global. std::heap::Heap (the default) is zero-size.

Or do you think that the layout-preserving Error should be stabilized in the first pass?

@alexcrichton I don’t really understand why this proposed first pass is at it is at all. There’s barely more than could already be done by abusing Vec, and not enough for example to use https://crates.io/crates/jemallocator.

What still needs to be resolved to stabilize the whole thing?

Only when you use a custom allocator that is not process-global. std::heap::Heap (the default) is zero-size.

That seems like the primary use case of having parametric allocators, no? Imagine the following simple definition of a tree:

struct Node<T, A: Alloc> {
    t: T,
    left: Option<Box<Node<T, A>>>,
    right: Option<Box<Node<T, A>>>,
}

A tree constructed from those with a 1-word Alloc would have a ~1.7x blowup in size for the whole data structure compared to a ZST Alloc. That seems pretty bad to me, and these kinds of applications are sort of the whole point of having Alloc be a trait.

@cramertj

We could add something like this:

We're also going to have actual trait aliases :) https://github.com/rust-lang/rust/issues/41517

@glaebhoerl Yeah, but stabilization still seems a ways off since there isn't an implementation yet. If we disable manual impls of Allocator I think we can switch to trait-aliases backwards-compatibly when they arrive ;)

@joshlf

A 2x (or more) blowup in the space required to store such a common and fundamental type is concerning to me.

I'd imagine all implementations today are just a zero-size type or a pointer large, right? Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)


@cramertj

We could add something like this:

Indeed! We've then taken one trait to three though. In the past we've never had a great experience with such traits. For example Box<Both> does not cast to Box<OnlyOneTrait>. I'm sure we could wait for language features to smooth all this out, but it seems like those are a long way off, at best.


@SimonSapin

What still needs to be resolved to stabilize the whole thing?

I don't know. I wanted to start with the absolute smallest thing so there would be less debate.

I'd imagine all implementations today are just a zero-size type or a pointer large, right? Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)

Yeah, the idea is that, given a pointer to an object allocated from your your type of allocator, you can figure out which instance it came from (e.g., using inline metadata). Thus, the only information you need to deallocate is type information, not runtime information.

To come back to alignment on deallocate, I see two ways forward:

  • Stabilize as proposed (with alignment on deallocate). Giving away ownership of manually allocated memory would be impossible unless the Layout is included. In particular, it is impossible to build a Vec or Box or String or other std container with a stricter alignment than required (for instance because you don't want the boxed element to straddle a cache line), without deconstructing and deallocating it manually later (which is not always an option). Another example of something that would be impossible, is filling a Vec using simd operations, and then giving it away.

  • Do not require alignment on deallocate, and remove the small-allocation optimization from Windows’ HeapAlloc-based alloc_system. Always store the alignment. @alexcrichton, as you committed that code, do you remember why it was put there in the first place? Do we have any evidence that it saves a significant amount of memory for real-world applications? (With microbenchmarks it is possible to make the results come out either way depending the allocation size — unless HeapAlloc was rounding up sizes anyway.)

In any case, this is a very difficult trade off to make; the memory and performance impact will depend highly on the type of application, and which one to optimize for is application-specific as well.

I think we may actually be Just Fine (TM). Quoting the Alloc docs:

Some of the methods require that a layout fit a memory block.
What it means for a layout to "fit" a memory block means (or
equivalently, for a memory block to "fit" a layout) is that the
following two conditions must hold:

  1. The block's starting address must be aligned to layout.align().

  2. The block's size must fall in the range [use_min, use_max], where:

    • use_min is self.usable_size(layout).0, and

    • use_max is the capacity that was (or would have been)
      returned when (if) the block was allocated via a call to
      alloc_excess or realloc_excess.

Note that:

  • the size of the layout most recently used to allocate the block
    is guaranteed to be in the range [use_min, use_max], and

  • a lower-bound on use_max can be safely approximated by a call to
    usable_size.

  • if a layout k fits a memory block (denoted by ptr)
    currently allocated via an allocator a, then it is legal to
    use that layout to deallocate it, i.e. a.dealloc(ptr, k);.

Note that last bullet. If I allocate with a layout with alignment a, then it should be legal for me to deallocate with alignment b < a because an object which is aligned to a is also aligned to b, and thus a layout with alignment b fits an object allocated with a layout with alignment a (and with the same size).

What this means is that you should be able to allocate with an alignment which is greater than the minimum alignment required for a particular type and then allow some other code to deallocate with the minimum alignment, and it should work.

Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)

There was an RFC for this recently and it seems very unlikely that it could be done due to compatibility concerns: https://github.com/rust-lang/rfcs/pull/2040

For example Box<Both> does not cast to Box<OnlyOneTrait>. I'm sure we could wait for language features to smooth all this out, but it seems like those are a long way off, at best.

Trait object upcasting on the other hand seems uncontroversially desirable, and mostly a question of effort / bandwidth / willpower to get it implemented. There was a thread recently: https://internals.rust-lang.org/t/trait-upcasting/5970

@ruuda I was the one who wrote that alloc_system implementation originally. alexcrichton merely moved it around during the great allocator refactors of <time period>.

The current implementation requires that you deallocate with the same alignment specified that you allocated a given memory block with. Regardless of what the documentation may claim, this is the current reality that everyone must abide by until alloc_system on Windows is changed.

Allocations on Windows always use a multiple of MEMORY_ALLOCATION_ALIGNMENT (although they remember the size you allocated them with to the byte). MEMORY_ALLOCATION_ALIGNMENT is 8 on 32bit and 16 on 64bit. For overaligned types, because the alignment is greater than MEMORY_ALLOCATION_ALIGNMENT, the overhead caused by alloc_system is consistently the amount of alignment specified, so a 64byte aligned allocation would have 64 bytes of overhead.

If we decided to extend that overaligned trick to all allocations (which would get rid of the requirement to deallocate with the same alignment that you specified when allocating), then more allocations would have overhead. Allocations whose alignments are identical to MEMORY_ALLOCATION_ALIGNMENT will suffer a constant overhead of MEMORY_ALLOCATION_ALIGNMENT bytes. Allocations whose alignments are less than MEMORY_ALLOCATION_ALIGNMENT will suffer an overhead of MEMORY_ALLOCATION_ALIGNMENT bytes approximately half of the time. If the size of the allocation rounded up to MEMORY_ALLOCATION_ALIGNMENT is greater than or equal to the size of the allocation plus the size of a pointer, then there is no overhead, otherwise there is. Considering 99.99% of allocations will not be overaligned, do you really want to incur that sort of overhead on all those allocations?

@ruuda

I personally feel that the implementation of alloc_system today on Windows is a bigger benefit than having the ability to relinquish ownership of an allocation to another container like Vec. AFAIK though there's no data to measure the impact of always padding with the alignment and not requiring an alignment on deallocation.

@joshlf

I think that comment is wrong, as pointed out alloc_system on Windows relies on the same alignment being passed to deallocation as was passed on allocation.

Considering 99.99% of allocations will not be overaligned, do you really want to incur that sort of overhead on all those allocations?

It depends on the application whether the overhead is significant, and whether to optimize for memory or performance. My suspicion is that for most applications either is fine, but a small minority cares deeply about memory, and they really cannot afford those extra bytes. And another small minority needs control over alignment, and they really need it.

@alexcrichton

I think that comment is wrong, as pointed out alloc_system on Windows relies on the same alignment being passed to deallocation as was passed on allocation.

Doesn't that imply that alloc_system on Windows doesn't actually properly implement the Alloc trait (and thus maybe we ought to change the requirements of the Alloc trait)?


@retep998

If I'm reading your comment correctly, isn't that alignment overhead present for all allocations regardless of whether we need to be able to deallocate with a different alignment? That is, if I allocate 64 bytes with 64 byte alignment and I also deallocate with 64 byte alignment, the overhead you described is still present. Thus, it's not a feature of being able to deallocate with different alignments so much as it is a feature of requesting larger-than-normal alignments.

@joshlf The overhead caused by alloc_system currently is due to requesting larger-than-normal alignments. If your alignment is less than or equal to MEMORY_ALLOCATION_ALIGNMENT, then there is no overhead caused by alloc_system.

However, if we changed the implementation to allow deallocating with different alignments, then the overhead would apply to nearly all allocations, regardless of alignment.

Ah I see; makes sense.

What is the meaning of implementing Alloc for both Heap and &Heap? In what cases would the user use one of those impls vs the other?

Is this the first standard library API in which *mut u8 would mean "pointer to whatever"? There is String::from_raw_parts but that one really does mean pointer to bytes. I am not a fan of *mut u8 meaning "pointer to whatever" -- even C does better. What are some other options? Maybe a pointer to opaque type would be more meaningful.

@rfcbot concern *mut u8

@dtolnay Alloc for Heap is sort of "standard" and Alloc for &Heap is like Write for &T where the trait requires &mut self but the implementation does not. Notably that means that types like Heap and System are threadsafe and do not need to be synchronized when allocating.

More importantly, though, usage of #[global_allocator] requires that the static it's attached to , which has type T, to have Alloc for &T. (aka all global allocators must be threadsafe)

For *mut u8 I think that *mut () may be interesting, but I don't personally feel too compelled to "get this right" here per se.

The main advantage of *mut u8 is that it is very convenient to use .offset with byte offsets.

For *mut u8 I think that *mut () may be interesting, but I don't personally feel too compelled to "get this right" here per se.

If we go with *mut u8 in a stable interface, aren't we locking ourselves in? In other words, once we stabilize this, we won't have a chance to "get this right" in the future.

Also, *mut () seems a bit dangerous to me in case we ever make an optimization like RFC 2040 in the future.

The main advantage of *mut u8 is that it is very convenient to use .offset with byte offsets.

True, but you could easily do let ptr = (foo as *mut u8) and then go about your merry way. That doesn't seem like enough of a motivation to stick with *mut u8 in the API if there are compelling alternatives (which, to be fair, I'm not sure there are).

Also, *mut () seems a bit dangerous to me in case we ever make an optimization like RFC 2040 in the future.

That optimization will already probably never happen-- it'd break too much existing code. Even if it did, it would be applied to &() and &mut (), not *mut ().

If RFC 1861 were close to being implemented/stabilized, I'd suggest using it:

extern { pub type void; }

pub unsafe trait Alloc {
    unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut void, Error>;
    unsafe fn dealloc(&mut self, ptr: *mut void, layout: Layout);
    // ...
}

It's probably too far away though, right?

@joshlf I thought I saw a PR open about them, the remaining unknown is DynSized.

Will this work for struct hack like objects? Say I have a Node<T> that looks like so:

struct Node<T> {
   size: u32,
   data: T,
   // followed by `size` bytes
}

and a value type:

struct V {
  a: u32,
  b: bool,
}

Now I want to allocate Node<V> with a string of size 7 in a single allocation. Ideally I want to make an allocation of size 16 align 4 and fit everything in it: 4 for u32, 5 for V, and 7 for the string bytes. This works because the last member of V has alignment 1 and the string bytes also have alignment 1.

Note that this is not allowed in C/C++ if the types are composed like above, as writing to packed storage is undefined behavior. I think this is a hole in the C/C++ standard that unfortunately cannot be fixed. I can expand on why this is broken but lets focus on Rust instead. Can this work? :-)

With respect to the size and alignment of the Node<V> structure itself, you're pretty much at the whim of the Rust compiler. It's UB (undefined behavior) to allocate with any size or alignment smaller than what Rust requires since Rust may make optimizations based on the assumption that any Node<V> object - on the stack, on the heap, behind a reference, etc - has a size and alignment matching what is expected at compile time.

In practice, it looks like the answer is unfortunately no: I ran this program and found that, at least on the Rust Playground, Node<V> is given a size of 12 and an alignment of 4, meaning that any objects after the Node<V> must be offset by at least 12 bytes. It looks like the offset of the data.b field within the Node<V> is 8 bytes, implying that bytes 9-11 are trailing padding. Unfortunately, even though those padding bytes "aren't used" in some sense, the compiler still treats them as part of the Node<V>, and reserves the right to do anything it likes with them (most importantly, including writing to them when you assign to a Node<V>, implying that if you try to squirrel away extra data there, it may get overwritten).

(note, btw: you can't treat a type as packed that the Rust compiler doesn't think is packed. However, you _can_ tell the Rust compiler that something is packed, which will change the type's layout (removing padding), using repr(packed))

However, with respect to laying out one object after another without having them both be part of the same Rust type, I'm almost 100% sure this is valid - after all, it's what Vec does. You can use the methods of the Layout type to dynamically calculate how much space is needed for the total allocation:

let node_layout = Layout::new::<Node<V>>();
// NOTE: This is only valid if the node_layout.align() is at least as large as mem::align_of_val("a")!
// NOTE: I'm assuming that the alignment of all strings is the same (since str is unsized, you can't do mem::align_of::<str>())
let padding = node_layout.padding_needed_for(mem::align_of_val("a"));
let total_size = node_layout.size() + padding + 7;
let total_layout = Layout::from_size_align(total_size, node_layout.align()).unwrap();

Would something like this work?

#[repr(C)]
struct Node<T> {
   size: u32,
   data: T,
   bytes: [u8; 0],
}

… then allocate with a larger size, and use slice::from_raw_parts_mut(node.bytes.as_mut_ptr(), size) ?

Thanks @joshlf for the detailed answer! The TLDR for my usecase is that I can get a Node<V> of size 16 but only if V is repr(packed). Otherwise the best I can do is size 19 (12 + 7).

@SimonSapin not sure; I will try.

Haven't really caught up with this thread, but I am dead set against stabilizing anything yet. We've not made progress implementing the the hard problems yet:

  1. Allocator-polymorphic collections

    • not even non-bloated box!

  2. Falliable collections

I think the design of the fundamental traits will affect the solutions of those: I've had little time for Rust for the past few months, but have argued this at times. I doubt I will have time to fully make my case here either, so I can only hope we first at least write up a complete solution to all of those: somebody prove me wrong that it's impossible to be rigorous (force correct usage), flexible, and ergonomic with the current traits. Or even just finish checking the boxes at the top.

Re: @Ericson2314's comment

I think that a relevant question related to the conflict between that perspective and @alexcrichton's desire to stabilize something is: How much benefit do we get from stabilizing a minimal interface? In particular, very few consumers will call Alloc methods directly (even most collections will probably use Box or some other similar container), so the real question becomes: what does stabilizing buy for users who will not be calling Alloc methods directly? Honestly the only serious use case I can think of is that it paves a path for allocator-polymorphic collections (which will likely be used by a much broader set of users), but it seems like that's blocked on #27336, which is far from being resolved. Maybe there are other large use cases I'm missing, but based on that quick analysis, I'm inclined to lean away from stabilization as having only marginal benefits at the cost of locking us into a design that we might later find to be suboptimal.

@joshlf it allows people to define and use their own global allocators.

Hmmm good point. would it be possible to stabilize specifying the global allocator without stabilizing Alloc? I.e., the code that implements Alloc would have to be unstable, but that would probably be encapsulated in its own crate, and the mechanism to mark that allocator as the global allocator would itself be stable. Or am I misunderstanding how stable/unstable and the stable compiler/nightly compiler interact?

Ah @joshlf remember that #27336 is a distraction, as per https://github.com/rust-lang/rust/issues/42774#issuecomment-317279035. I'm pretty sure we'll run into other problems---problems with the traits as they exist, which is why I want to work to commence work on that now. It's a lot easier to discuss those problems once they arrive for all to see than debate predicted futures post-#27336 .

@joshlf But you can't compile the crate that defines the global allocator with a stable compiler.

@sfackler Ah yes, there's that misunderstanding I was afraid of :P

I find the name Excess(ptr, usize) a bit confusing because the usize is not the excess in size of the requested allocation (as in the extra size allocated), but the total size of the allocation.

IMO Total, Real, Usable, or any name that conveys that the size is the total size or real size of the allocation is better than "excess", which I find misleading. The same applies for the _excess methods.

I agree with @gnzlbg above, I think a plain (ptr, usize) tuple would be just fine.

Note that Excess is not proposed to be stablized in the first pass, however

Posted this thread for discussion on reddit, which has some people with concerns: https://www.reddit.com/r/rust/comments/78dabn/custom_allocators_are_on_the_verge_of_being/

After further discussion with @rust-lang/libs today, I'd like to make a few tweaks to the stabilization proposal which can be summarized with:

  • Add alloc_zeroed to the set of stabilized methods, otherwise having the same signature as alloc.
  • Change *mut u8 to *mut void in the API using extern { type void; } support, solving @dtolnay's concern and providing a way forward for unifying c_void across the ecosystem.
  • Change the return type of alloc to *mut void, removing the Result and the Error

Perhaps the most contentious is the last point so I want to elaborate on that as well. This came out of discussion with the libs team today and specifically revolved around how (a) the Result-based interface has a less efficient ABI than a pointer-returning one and (b) almost no "production" allocators today provide the ability to learn anything more than "this just OOM'd". For performance we can mostly paper over it with inlining and such but it remains that the Error is an extra payload that's difficult to remove at the lowest layers.

The thinking for returning payloads of errors is that allocators can provide implementation-specific introspection to learn why an allocation failed and otherwise almost all consumers should only need to know whether the allocation succeeded or failed. Additionally this is intended to be a very low-level API which isn't actually called that often (instead, the typed APIs which wrap things up nicely should be called instead). In that sense it's not paramount that we have the most usable and ergonomic API for this location, but rather it's more important about enabling use cases without sacrificing performance.

The main advantage of *mut u8 is that it is very convenient to use .offset with byte offsets.

In the libs meeting we also suggested impl *mut void { fn offset } which does not conflict with the existing offset defined for T: Sized. Could also be byte_offset.

+1 for using *mut void and byte_offset. Is there going to be an issue with stabilization of the extern types feature, or can we sidestep that issue because only the definition is unstable (and liballoc can do unstable things internally) and not the use (e.g., let a: *mut void = ... isn't unstable)?

Yep, we don't need to block on extern type stabilization. Even if extern type support gets deleted, the void we define for this can always be a magical type worst case.

Was there any further discussion in the libs meeting about whether or not Alloc and Dealloc should be separate traits?

We didn't touch on that specifically, but we generally felt that we shouldn't be diverting from prior art unless we have a particularly compelling reason to do so. In particular, C++'s Allocator concept does not have a similar split.

I'm not sure that's an apt comparison in this case. In C++, everything's explicitly freed, so there's no equivalent to Box that needs to store a copy of (or a reference to) its own allocator. That's what causes the large size blowup for us.

std::unique_ptr is generic on a "Deleter".

@joshlf unique_ptr is the equivalent of Box, vector is the equivalent of Vec, unordered_map is the equivalent of HashMap, etc.

@cramertj Ah, interesting, I was only looking at collections types. It seems like that might be a thing to do then. We can always add it in later via blanket impls but it'd probably be cleaner to avoid that.

The blanket impl approach might be cleaner, actually:

pub trait Dealloc {
    fn dealloc(&self, ptr: *mut void, layout: Layout);
}

impl<T> Dealloc for T
where
    T: Alloc
{
    fn dealloc(&self, ptr: *mut void, layout: Layout) {
        <T as Alloc>::dealloc(self, ptr, layout)
    }
}

One less trait to worry about for the majority of use cases.

  • Change the return type of alloc to *mut void, removing the Result and the Error

Perhaps the most contentious is the last point so I want to elaborate on that as well. This came out of discussion with the libs team today and specifically revolved around how (a) the Result-based interface has a less efficient ABI than a pointer-returning one and (b) almost no "production" allocators today provide the ability to learn anything more than "this just OOM'd". For performance we can mostly paper over it with inlining and such but it remains that the Error is an extra payload that's difficult to remove at the lowest layers.

I’m concerned that this would make it very easy to use the returned pointer without checking for null. It seems that the overhead could also be removed without adding this risk by returning Result<NonZeroPtr<void>, AllocErr> and making AllocErr zero-size?

(NonZeroPtr is a merged ptr::Shared and ptr::Unique as proposed in https://github.com/rust-lang/rust/issues/27730#issuecomment-316236397.)

@SimonSapin something like Result<NonZeroPtr<void>, AllocErr> requires three types to be stabilized, all of which are brand new and some of which have historically been languishing quite a bit for stabilization. Something like void isn't even required and is a nice-to-have (in my opinion).

I agree it's "easy to use it without checking for null", but this is, again, a very low-level API that's not intended to be heavily used, so I don't think we should optimize for caller ergonomics.

People can also build higher level abstractions like alloc_one on top of the low level alloc that could have more complex return types like Result<NonZeroPtr<void>, AllocErr>.

I agree that AllocErr would not be useful in practice, but how about just Option<NonZeroPtr<void>>? APIs that are impossible to accidentally misuse, without overhead, is one of the things that set Rust apart from C, and returning to C-style null pointers feels like a step backwards to me. Saying it is “a very low-level API that's not intended to be heavily used” is like saying that we should not care about memory safety on uncommon microcontroller architectures because they are very low-level and not heavily used.

Every interaction with the allocator involves unsafe code regardless of the return type of this function. Low level allocation APIs are possible to misuse whether the return type is Option<NonZeroPtr<void>> or *mut void.

Alloc::alloc in particular is the API that is low level and not intended to be heavily used. Methods like Alloc::alloc_one<T> or Alloc::alloc_array<T> are the alternatives that would be more heavily used and have a "nicer" return type.

A stateful AllocError is not worth it, but a zero sized type that implements Error and has a Display of allocation failure is nice to have. If we go the NonZeroPtr<void> route, I see Result<NonZeroPtr<void>, AllocError> as preferable to Option<NonZeroPtr<void>>.

Why the rush to stabilize :(!! Result<NonZeroPtr<void>, AllocErr> is indisputably nicer for clients to use. Saying this is a "very low-level API" that need not be nice is just depressingly unambitious. Code at all levels should be as safe and maintainable as possible; obscure code that isn't constantly edited (and thus paged in people's short term memories) all the more so!

Furthermore, if we are to have have user-written allocate-polymorphic collections, which I certainly hope for, that's an open amount of fairly complex code using allocators directly.

Re dealloaction, operationally, we almost certainly want to reference/clone the alloactor just once per tree-based collection. That means passing in the allocator to each custom-allocator-box being destroyed. But, It's an open problem on how best to do this in Rust without linear types. Contrary to my previous comment, I'd be OK with some unsafe code in collections implementations for this, because the ideal usecase changes the implementation of Box, not the implementation of split allocator and deallocator traits. I.e. we can make stabilizable progress without blocking on linearity.

@sfackler I think we need some associated types connecting the deallocator to the allocator; it may be not possible to retrofit them.

@Ericson2314 There is a "rush" to stabilize because people want to use allocators to real things in the real world. This is not a science project.

What would that associated type be used for?

@sfackler people can still pin a nightly / and type of people who care about this sort of advance feature should be comfortable doing that. [If the issue is unstable rustc vs unstable Rust, that's a different problem that needs a policy fix.] Baking in lousy APIs, on the contrary, hamstrings us forever, unless we want to split the ecosystem with a new 2.0 std.

The associated types would relate the deallocator to the allocator. Each needs to know about that other for this to work. [There's still the issue of using the wrong (de)allocator of the right type, but I accept that no one has remotely proposed a solution to that.]

If people can just pin to a nightly, why do we have stable builds at all? The set of people who are directly interacting with allocator APIs is much smaller than the people who want to take advantage of those APIs by e.g. replacing the global allocator.

Can you write some code that shows why a deallocator needs to know the type of its associated allocator? Why doesn't C++'s allocator API need a similar mapping?

If people can just pin to a nightly, why do we have stable builds at all?

To indicate language stability. Code you write against this version of things will never break. On a newer compiler. You pin a nightly when you need something so bad, it's not worth waiting for the final iteration of the feature deemed of quality worthy of that guarantee.

The set of people who are directly interacting with allocator APIs is much smaller than the people who want to take advantage of those APIs by e.g. replacing the global allocator.

Aha! This would be for moving jemalloc out of tree, etc? No one has proposed stabilizing the awful hacks that allow choosing the global allocator, just the heap static itself? Or did I read the proposal wrong?

The awful hacks that allow for choosing the global allocator are proposed to be stabilized, which is half of what allows us to move jemalloc out of tree. This issue is the other half.

#[global_allocator] attribute stabilization: https://github.com/rust-lang/rust/issues/27389#issuecomment-336955367

Yikes

@Ericson2314 What do you think would be a non-awful way to select the global allocator?

(Responded in https://github.com/rust-lang/rust/issues/27389#issuecomment-342285805)

The proposal has been amended to use *mut void.

@rfcbot resolved *mut u8

@rfcbot reviewed

After some discussion on IRC, I'm approving this with the understanding that we _do not_ intend to stabilize a Box generic on Alloc, but instead on some Dealloc trait with an appropriate blanket impl, as suggested by @sfackler here. Please let me know if I've misunderstood the intention.

@cramertj Just to clarify, it's possible to add that blanket impl after the fact and not break the Alloc definition that we're stabilizing here?

@joshlf yep, it'd look like this: https://github.com/rust-lang/rust/issues/32838#issuecomment-340959804

How will we specify the Dealloc for a given Alloc? I'd imagine something like this?

pub unsafe trait Alloc {
    type Dealloc: Dealloc = Self;
    ...
}

I guess that puts us in thorny territory WRT https://github.com/rust-lang/rust/issues/29661.

Yeah, I don't think there's a way to have the addition of Dealloc be backwards-compatible with existing definitions of Alloc (which don't have that associated type) without having a default.

If you wanted to automatically be able to grab the deallocator corresponding to an allocator, you'd need more than just an associated type, but a function to produce a deallocator value.

But, this can be handled in the future with that stuff being attached to a separate subtrait of Alloc I think.

@sfackler i'm not sure I understand. Can you write out the signature of Box::new under your design?

This is ignoring placement syntax and all of that, but one way you could do it would be

pub struct Box<T, D>(NonZeroPtr<T>, D);

impl<T, D> Box<T, D>
where
    D: Dealloc
{
    fn new<A>(alloc: A, value: T) -> Box<T, D>
    where
        A: Alloc<Dealloc = D>
    {
        let ptr = alloc.alloc_one().unwrap_or_else(|_| alloc.oom());
        ptr::write(&value, ptr);
        let deallocator = alloc.deallocator();
        Box(ptr, deallocator)
    }
}

Notably, we need to actually be able to produce an instance of the deallocator, not just know its type. You could also parameterize the Box over the Alloc and store A::Dealloc instead, which might help with type inference. We can make this work after this stabilization by moving Dealloc and deallocator to a separate trait:

pub trait SplitAlloc: Alloc {
    type Dealloc;

    fn deallocator(&self) -> Self::Dealloc;
}

But what would the impl of Drop look like?

impl<T, D> Drop for Box<T, D>
where
    D: Dealloc
{
    fn drop(&mut self) {
        unsafe {
            ptr::drop_in_place(self.0);
            self.1.dealloc_one(self.0);
        }
    }
}

But assuming we stabilize Alloc first, then not all Allocs will implement Dealloc, right? And I thought impl specialization was still a ways off? In other words, in theory, you'd want to do something like the following, but I don't think it works yet?

impl<T, D> Drop for Box<T, D> where D: Dealloc { ... }
impl<T, A> Drop for Box<T, A> where A: Alloc { ... }

If anything, we'd have a

default impl<T> SplitAlloc for T
where
    T: Alloc { ... }

But I don't think that'd really be necessary. The use cases for custom allocators and global allocators are distinct enough that I wouldn't assume there'd be a ton of overlap between them.

I suppose that could work. It seems much cleaner to me, though, to just have Dealloc right off the bat so we can have the simpler interface. I imagine we could have a pretty simple, uncontroversial interface that would require no change to existing code that already implements Alloc:

unsafe trait Dealloc {
    fn dealloc(&mut self, ptr: *mut void, layout: Layout);
}

impl<T> Dealloc for T
where
    T: Alloc
{
    fn dealloc(&self, ptr: *mut void, layout: Layout) {
        <T as Alloc>::dealloc(self, ptr, layout)
    }
}

unsafe trait Alloc {
    type Dealloc: Dealloc = &mut Self;
    fn deallocator(&mut self) -> Self::Dealloc { self }
    ...
}

I though associated type defaults were problematic?

A Dealloc that's a mutable reference to the allocator seems not all that useful - you can only allocate one thing at a time, right?

I though associated type defaults were problematic?

Oh I guess associated type defaults are far enough away that we can't rely on them.

Still, we could have the simpler:

unsafe trait Dealloc {
    fn dealloc(&mut self, ptr: *mut void, layout: Layout);
}

impl<T> Dealloc for T
where
    T: Alloc
{
    fn dealloc(&self, ptr: *mut void, layout: Layout) {
        <T as Alloc>::dealloc(self, ptr, layout)
    }
}

unsafe trait Alloc {
    type Dealloc: Dealloc;
    fn deallocator(&mut self) -> Self::Dealloc;
    ...
}

and just require the implementor to write a bit of boilerplate.

A Dealloc that's a mutable reference to the allocator seems not all that useful - you can only allocate one thing at a time, right?

Yeah, good point. Probably a moot point anyway given your other comment.

Should deallocator take self, &self, or &mut self?

Probably &mut self to be consistent with the other methods.

Are there any allocators that would prefer to take self by value so they don't have to e.g. clone state?

The problem with taking self by value is that it precludes getting a Dealloc and then continuing to allocate.

I'm thinking of a hypothetical "oneshot" allocator, though I don't know how much of a real thing that is.

Such an allocator might exist, but taking self by value would require that _all_ allocators work that way, and would preclude any allocators allowing allocation after deallocator has been called.

I would still like to see some of this implemented and used in collections before we think about stabilizing it.

Do you think https://github.com/rust-lang/rust/issues/27336 or the points discussed in https://github.com/rust-lang/rust/issues/32838#issuecomment-339066870 will allow us to move forward on collections?

I'm worried about the type alias approach's impact on documentation readability. A (very verbose) way to allow progress would be to wrap types:

pub struct Vec<T>(alloc::Vec<T, Heap>);

impl<T> Vec<T> {
    // forwarding impls for everything
}

I know it's a pain, but it seems like the changes we're discussing here are big enough that if we decide to go forward with split alloc/dealloc traits, we should try them out in std first and then re-FCP.

What is the timeline on waiting on this stuff to get implemented?

The grow_in_place method doesn't return any kind of excess capacity. It currently calls usable_size with a layout, extends the allocation to _at least_ fit this layout, but if the allocation is extended beyond that layout users have no way to know.

I am having a hard time understanding the advantage of the alloc and realloc methods over alloc_excess and realloc_excess.

An allocator needs to find a suitable memory block to perform an allocation: this requires knowing the size of the memory block. Whether the allocator then returns a pointer, or the tuple "pointer and size of the memory block" does not make any measurable performance differences.

So alloc and realloc just increase the API surface and seem to encourage writing less performant code. Why do we have them in the API at all? What's their advantage?


EDIT: or in other words: all potentially allocating functions in the API should return Excess, which basically removes the need for all the _excess methods.

Excess is only interesting for use cases that involve arrays which can grow. It's not useful or relevant for Box or BTreeMap, for example. There may be some cost in computing what the excess is, and there's certainly a more complex API, so it doesn't seem to me like code that doesn't care about excess capacity should be forced to pay for it.

There may be some cost in computing what the excess is

Can you give an example? I don't know, and cannot imagine, an allocator that is able to allocate memory but that does not know how much memory it is actually allocating (which is what Excess is: real amount of memory allocated; we should rename it).

The only commonly used Allocator where this might be slightly controversial is POSIX malloc, which even though it always computes the Excess internally, does not expose it as part of its C API. However, returning the requested size as the Excess is ok, portable, simple, incurs no cost at all, and is what everybody using POSIX malloc is already assuming anyways.

jemalloc and basically any other Allocator out there provide API's that returns the Excess without incurring any costs, so for those allocators, returning the Excess is zero cost as well.

There may be some cost in computing what the excess is, and there's certainly a more complex API, so it doesn't seem to me like code that doesn't care about excess capacity should be forced to pay for it.

Right now everybody is already paying the price of the allocator trait having two APIs for allocating memory. And while one can build an Excess-less API on top of an Excess-full one`, the opposite is not true. So I wonder why it isn't done like this:

  • Alloc trait methods always return Excess
  • add an ExcessLessAlloc trait that just drops off the Excess from Alloc methods for all users that 1) care enough to use Alloc but 2) don't care about the real amount of memory currently being allocated (looks like a niche to me, but I still think that such an API is nice to have)
  • if one day somebody discovers a way to implement Allocators with fast-paths for Excess-less methods, we can always provide a custom implementation of ExcessLessAlloc for it.

FWIW I just landed on this thread again because I can't implement what I want on top of Alloc. I mentioned that it is missing grow_in_place_excess before, but I just got stuck again because it is also missing alloc_zeroed_excess (and who knows what else).

I'd be more comfortable if the stabilization here would focus on stabilizing an Excess-full API first. Even if its API is not the most ergonomic for all uses, such an API would at least allow all uses which is a necessary condition to show that the design is not flawed.

Can you give an example? I don't know, and cannot imagine, an allocator that is able to allocate memory but that does not know how much memory it is actually allocating (which is what Excess is: real amount of memory allocated; we should rename it).

Most allocators today use size classes, where each size class allocates only objects of a particular fixed size, and allocation requests that don't fit a particular size class are rounded up to the smallest size class that they fit inside. In this scheme, it's common to do things like having an array of size class objects and then doing classes[size / SIZE_QUANTUM].alloc(). In that world, figuring out what size class is used takes extra instructions: e.g., let excess = classes[size / SIZE_QUANTUM].size. It may not be a lot, but the performance of high-performance allocators (like jemalloc) are measured in single clock cycles, so it could represent meaningful overhead, especially if that size ends up getting passed through a chain of function returns.

Can you give an example?

At least going off of your PR to alloc_jemalloc, alloc_excess is pretty clearly running more code than alloc: https://github.com/rust-lang/rust/pull/45514/files.

In this scheme, it's common to do things like having an array of size class objects and then doing classes[size / SIZE_QUANTUM].alloc(). In that world, figuring out what size class is used takes extra instructions: e.g., let excess = classes[size / SIZE_QUANTUM].size

So let me see if I follow properly:

// This happens in both cases:
let size_class = classes[size / SIZE_QUANTUM];
let ptr = size_class.alloc(); 
// This would happen only if you need to return the size:
let size = size_class.size;
return (ptr, size);

Is that it?


At least going off of your PR to alloc_jemalloc, alloc_excess is pretty clearly running more code than alloc

That PR was a bugfix (not a perf fix), there are many things wrong with the current state of our jemalloc layer perf-wise but since that PR it at least returns what it should:

  • nallocx is a const function in the GCC sense, that is, a true pure function. This means it has no side-effects, its results depends on its arguments only, it accesses no global state, its arguments are not pointers (so the function cannot access global state throw them), and for C/C++ programs LLVM can use this information to elide the call if the result is not used. AFAIK Rust currently cannot mark FFI C functions as const fn or similar. So this is the first thing that could be fixed and that would make realloc_excess zero-cost for those that don't use the excess as long as inlining and optimizations work properly.
  • nallocx is always computed for aligned allocations inside mallocx, that is, all code is already comptuing it, but mallocx throws its result away, so here we are actually computing it twice, and in some cases nallocx is almost as expensive as mallocx... I have a fork of jemallocator that has some benchmarks for things like this in its branches, but this must be fixed upstream by jemalloc by providing an API that does not throw this away. This fix, however, only affects those that are currently using the Excess.
  • and then is the issue that we are computing the align flags twice, but that is something that LLVM can optimize on our side (and trivial to fix).

So yes, it looks like more code, but this extra code is code that we are actually calling twice, because the first time that we called it we threw the results away. It is not impossible to fix, but I haven't found the time to do this yet.


EDIT: @sfackler I managed to free some time for it today and was able to make alloc_excess "free" with respect to alloc in jemallocs slow path, and have only an overhead of ~1ns in jemallocs' fast path. I haven't really looked at the fast path in much detail, but it might be possible to improve this further. The details are here: https://github.com/jemalloc/jemalloc/issues/1074#issuecomment-345040339

Is that it?

Yes.

So this is the first thing that could be fixed and that would make realloc_excess zero-cost for those that don't use the excess as long as inlining and optimizations work properly.

When used as the global allocator, none of this can be inlined.

Even if its API is not the most ergonomic for all uses, such an API would at least allow all uses which is a necessary condition to show that the design is not flawed.

There is literally zero code on Github that calls alloc_excess. If this is such a crucially important feature, why has no one ever used it? C++'s allocation APIs do not provide access to excess capacity. It seems incredibly straightforward to add/stabilize these features in the future in a backwards compatible way if there is actual concrete evidence that they improve performance and anyone actually cares enough to use them.

When used as the global allocator, none of this can be inlined.

Then that is a problem that we should try to solve, at least for LTO builds, because global allocators like jemalloc rely on this: nallocx is the way it is _by design_, and the first recommendation jemalloc's devs made us regarding alloc_excess performance is that we should have those calls inlined, and we should propagate C attributes properly, so that the compiler removes the nallocx calls from the call-sites that do not use the Excess, like C and C++ compilers do.

Even if we can't do that, the Excess API can still be made zero-cost by patching the jemalloc API (I have an initial implementation of such a patch in my rust-lang/jemalloc fork). We could either maintain that API ourselves, or try to land it upstream, but for that to land upstream we must make a good case about why these other languages can perform these optimizations and Rust cannot. Or we must have another argument, like this new API is significantly faster than mallocx + nallocx for those users that do need the Excess.

If this is such a crucially important feature, why has no one ever used it?

That's a good question. std::Vec is the poster-child for using the Excess API, but it currently does not use it, and all my previous comments stating "this and that are missing from the Excess API" were me trying to make Vec use it. The Excess API:

I cannot know why nobody is using this API. But given that not even the std library can use it for the data-structure it is best suited for (Vec), if I had to guess, I would say that the main reason is that this API is currently broken.

If I had to guess even further, I would say that not even those who designed this API have used it, mainly because no single std collection uses it (which is where I expect this API to be tested at first), and also because using _excess and Excess everywhere to mean usable_size/allocation_size is extremely confusing/annoying to program with.

This is probably because more work was put into the Excess-less APIs, and when you have two APIs, it is hard to keep them in sync, it is hard for users to discover both and know which to use, and finally, it is hard for users to prefer convenience over doing the right thing.

Or in other words, if I have two competing APIs, and I put 100% of the work into improving one, and 0% of the work into improving the other, it isn't surprising to reach the conclusion that one is in practice significantly better than the other.

As far as I can tell, these are the only two calls to nallocx outside of jemalloc tests on Github:

https://github.com/facebook/folly/blob/f2925b23df8d85ebca72d62a69f1282528c086de/folly/detail/ThreadLocalDetail.cpp#L182
https://github.com/louishust/mysql5.6.14_tokudb/blob/4897660dee3e8e340a1e6c8c597f3b2b7420654a/storage/tokudb/ft-index/ftcxx/malloc_utils.hpp#L91

Neither of them resemble the current alloc_excess API, but are rather used standalone to compute an allocation size before it's made.

Apache Arrow looked into using nallocx in their implementation but found things did not work out well:

https://issues.apache.org/jira/browse/ARROW-464

These are basically the only references to nallocx I can find. Why is it important that the initial implementation of allocator APIs support such an obscure feature?

As far as I can tell, these are the only two calls to nallocx outside of jemalloc tests on Github:

From the top of my head I know that at least facebook's vector type is using it via facebook's malloc implementation (malloc and fbvector growth policy; that is a big chunk of C++'s vectors at facebook use this) and also that Chapel used it to improve the performance of their String type (here and the tracking issue). So maybe today wasn't Github's best day?

Why is it important that the initial implementation of allocator APIs support such an obscure feature?

The initial implementation of an allocator API does not need to support this feature.

But good support for this feature should block the stabilization of such an API.

Why should it block stabilization if it can be added backwards-compatibly later?

Why should it block stabilization if it can be added backwards-compatibly later?

Because for me at least it means that only half of the design space has been sufficiently explored.

Do you expect the non-excess related portions of the API will be affected by the design of the excess-related functionality? I admit I've only followed that discussion half-heartedly but it seems unlikely to me.

If we can't make this API:

fn alloc(...) -> (*mut u8, usize) { 
   // worst case system API:
   let ptr = malloc(...);
   let excess = malloc_excess(...);
   (ptr, excess)
}
let (ptr, _) = alloc(...); // drop the excess

as efficient as this one:

fn alloc(...) -> *mut u8 { 
   // worst case system API:
   malloc(...)
}
let ptr = alloc(...);

then we have bigger problems.

Do you expect the non-excess related portions of the API will be affected by the design of the excess-related functionality?

So yes, I expect a good excess-API to have a huge effect on the design of the non-excess related functionality: it would completely remove it.

That would prevent the current situation of having two APIs that are out of sync, and in which the excess-api has less functionality than the excess-less one. While one can build an excess-less API on top of an excess-full one, the opposite is not true.

Those who want to drop the Excess should just drop it.

To clarify, if there were some way of adding an alloc_excess method after the fact in a backwards-compatible way, then you'd be OK with it? (but of course, stabilizing without alloc_excess means that adding it later would be a breaking change; I'm just asking so I understand your reasoning)

@joshlf It is very straightforward to do that.

:bell: This is now entering its final comment period, as per the review above. :bell:

Those who want to drop the Excess should just drop it.

Alternatively, the 0.01% of people that care about excess capacity can use another method.

@sfackler This is what I get for taking a two-week break from rust - I forget about default method impls :)

Alternatively, the 0.01% of people that care about excess capacity can use another method.

Where you are getting this number?

All of my Rust data-structures are flat in memory. The ability to do that is the only reason I use Rust; if I could just Box everything I would be using a different language. So I don't care about the Excess the 0.01% of the time, I care about it all the time.

I understand that this is domain specific, and that in other domains people would never care about the Excess, but I doubt that only 0.01% of Rust users care about this (I mean, a lot of people use Vec and String, which are the poster-child data-structures for Excess).

I am getting that number from the fact that there are approx 4 things in total that use nallocx, compared to the set of things that use malloc.

@gnzlbg

Are you suggesting that if we did it "right" from the start, we'd have just fn alloc(layout) -> (ptr, excess) and no fn alloc(layout) -> ptr at all? That seems far from obvious to me. Even if excess is available, it seems natural to have the latter API for the use cases where excess doesn't matter (e.g., most tree structures), even if it's implemented as alloc_excess(layout).0.

@rkruppe

That seems far from obvious to me. Even if excess is available, it seems natural to have the latter API for the use cases where excess doesn't matter (e.g., most tree structures), even if it's implemented as alloc_excess(layout).0.

Currently, the excess-full API is implemented on top of the excess-less one. Implementing Alloc for an excess-less allocator requires the user to provide the alloc and dealloc methods.

However, if I want to implement Alloc for an excess-full allocator, I need to provide more methods (at least alloc_excess, but this grows if we go into realloc_excess, alloc_zeroed_excess, grow_in_place_excess, ...).

If we were to do it the other way around, that is, implement the excess-less API as a nicety on top of the excess-full one, then implementing alloc_excess and dealloc suffices for supporting both types of allocators.

The users that don't care or can't return or query the excess can just return the input size or layout (which is a tiny inconvenience), but the users that can handle and want to handle the excess don't need to implement any more methods.


@sfackler

I am getting that number from the fact that there are approx 4 things in total that use nallocx, compared to the set of things that use malloc.

Given these facts about _excess usage in the Rust ecosystem:

  • 0 things in total use _excess in the rust ecosystem
  • 0 things in total use _excess in the rust std library
  • not even Vec and String can use the _excess API properly in the rust std library
  • the _excess API is unstable, out-of-sync with the excess-less API, buggy until very recently (did not even return the excess at all), ...

    and given these facts about the usage of _excess in other languages:

  • jemalloc's API is not natively supported by C or C++ programs due to backwards compatibility

  • C and C++ programs that want to use jemalloc's excess API need to go way out of their way to use it by:

    • opting out of the system allocator and into jemalloc (or tcmalloc)

    • re-implement their language's std library (in the case of C++, implement an incompatible std library)

    • write their whole stack on top of this incompatible std library

  • some communities (firefox uses it, facebook reimplements the collections in the C++ standard library to be able to use it, ...) still go out of their way to use it.

These two arguments look plausible to me:

  • The excess API in std is not usable, therefore the std library cannot use it, therefore nobody can, which is why it isn't used even once in the Rust ecosystem.
  • Even though C and C++ make it close to impossible to use this API, big projects with manpower go to great lengths to use it, therefore at least some potentially tiny community of people care _a lot_ about it.

Your argument:

  • Nobody uses the _excess API, therefore only 0.01% of the people care about it.

does not.

@alexcrichton The decision to switch from -> Result<*mut u8, AllocErr> to -> *mut void may come as a significant surprise to people who followed the original development of the allocator RFC's.

I don't disagree with the points you make, but it nonetheless seemed like a fair number of people would have been willing to live with the "heavy-weightness" of Result over the increased likelihood of missing a null-check on the returned value.

  • I am ignoring the runtime efficiency issues imposed by the ABI because I, like @alexcrichton, assume that we could deal with those in some way via compiler tricks.

Is there some way we could get increased visibility on that late change on its own?

One way (off the top of my head): Change the signature now, in a PR on its own, on the master branch, while Allocator is still unstable. And then see who complains on the PR (and who celebrates!).

  • Is this too heavy-handed? Seems like it is by definitions less heavy-handed than coupling such a change with stabilization...

On the subject of whether to return *mut void or to return Result<*mut void, AllocErr>: Its possible that we should be revisiting the idea of separate "high-level" and "low-level" allocator traits, as discussed in take II of the Allocator RFC.

(Obviously if I had a serious objection to the *mut void return value, then I would file it as a concern via the fcpbot. But at this point I'm pretty much trusting the judgement of the libs team, perhaps in some part due to fatigue over this allocator saga.)

@pnkfelix

The decision to switch from -> Result<*mut u8, AllocErr> to -> *mut void may come as a significant surprise to people who followed the original development of the allocator RFC's.

The latter implies that, as discussed, the only error we care to express is OOM. Thus, a slightly lighter-weight in-between that still has the benefit of protection against accidental failure to check for errors is -> Option<*mut void>.

@gnzlbg

The excess API in std is not usable, therefore the std library cannot use it, therefore nobody can, which is why it isn't used even once in the Rust ecosystem.

Then go fix it.

@pnkfelix

On the subject of whether to return mut void or to return Result<mut void, AllocErr>: Its possible that we should be revisiting the idea of separate "high-level" and "low-level" allocator traits, as discussed in take II of the Allocator RFC.

Those were basically our thoughts, except that the high level API would be in Alloc itself as alloc_one, alloc_array etc. We can even let those develop in the ecosystem first as extension traits to see what APIs people converge on.

@pnkfelix

The reason I made Layout only implement Clone and not Copy is that I wanted to leave open the possibility of adding more structure to the Layout type. In particular, I still am interested in trying to have the Layout attempt to track any type structure used to construct it (e.g. 16-array of struct { x: u8, y: [char; 215] }), so that allocators would have the option of exposing instrumentation routines that report on what types their current contents are composes from.

Has this been experimented with somewhere?

@sfackler I have done most of it already, and all of it can be done with the duplicated API (no excess + _excess methods). I would be fine with having two APIs and not having a complete _excess API right now.

The only thing that still worries me a bit is that to implement an allocator right now one needs to implement alloc + dealloc, but alloc_excess + dealloc should also work as well. Would it be possible to give alloc a default implementation in terms of alloc_excess later or is that a not possible or a breaking change? In practice, most allocators are going to implement most methods anyways, so this is not a big deal, but more like a wish.


jemallocator implements Alloc twice (for Jemalloc and &Jemalloc), where the Jemalloc implementation for some method is just a (&*self).method(...) that forwards the method call to the &Jemalloc implementation. This means that one must manually keep both implementations of Alloc for Jemalloc in sync. Whether getting different behaviors for the &/_ implementations can be tragic or not, I don't know.


I've found it very hard to find out what people are actually doing with the Alloc trait in practice. The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox), and are only using it to change the global allocator. It worries me a lot that I could not find any project that uses it as a collection type parameter (maybe I was just unlucky and there are some?). I was particularly looking for examples of implementing SmallVec and ArrayVec on top of a Vec-like type (since std::Vec doesn't have an Alloc type parameter yet), and also was wondering how cloning between these types (Vecs with a different Allocator) would work (the same probably applies to cloning Boxes with different Allocs). Are there examples of how these implementations would look like somewhere?

The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox)

For what it’s worth, Servo is trying to move off of unstable features where possible: https://github.com/servo/servo/issues/5286

This is also a chicken-and-egg problem. Many projects don’t use Alloc yet because it’s still unstable.

It's not really clear to me why we should have a full complement of _excess APIs in the first place. They originally existed to mirror jemalloc's experimental *allocm API, but those were removed in 4.0 several years ago in favor of not duplicating their entire API surface. It seems like we could follow their lead?

Would it be possible to give alloc a default implementation in terms of alloc_excess later or is that a not possible or a breaking change?

We can add a default implementation of alloc in terms of alloc_excess, but alloc_excess will need to have a default implementation in terms of alloc. Everything works fine if you implement one or both, but if you don't implement either, your code will compile but infinitely recurse. This has come up before (maybe for Rand?), and we could have some way of saying that you need to implement at least one of those functions but we don't care which.

It worries me a lot that I could not find any project that uses it as a collection type parameter (maybe I was just unlucky and there are some?).

I don't know of anyone that is doing that.

The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox)

One big thing preventing this from moving forward is that stdlib collections don't support parametric allocators yet. That pretty much precludes most other crates as well, since most external collections use internal ones under the hood (Box, Vec, etc).

The only projects that I've found that are using it are going to stay using nightly anyways (servo, redox)

One big thing preventing this from moving forward is that stdlib collections don't support parametric allocators yet. That pretty much precludes most other crates as well, since most external collections use internal ones under the hood (Box, Vec, etc).

This applies to me -- I've got a toy kernel, and if I could I'd be using Vec<T, A>, but instead I have to have an interior-mutable global allocator facade, which is gross.

@remexre how is parameterizing your data structures going to avoid global state with interior mutability?

There will still be interior-mutable global state I suppose, but it feels a lot safer to have a setup where the global allocator is unusable until memory is fully mapped than to have a global set_allocator function.


EDIT: Just realized I didn't answer the question. Right now, I've got something like:

struct BumpAllocator{ ... }
struct RealAllocator{ ... }
struct LinkedAllocator<A: 'static + AreaAllocator> {
    head: Mutex<Option<Cons<A>>>,
}
#[global_allocator]
static KERNEL_ALLOCATOR: LinkedAllocator<&'static mut (AreaAllocator + Send + Sync)> =
    LinkedAllocator::new();

where AreaAllocator is a trait that lets me (at runtime) verify that the allocators aren't accidentally "overlapping" (in terms of the address ranges they allocate into). BumpAllocator is only used very early on, for scratch space when mapping the rest of memory to create the RealAllocators.

Ideally, I'd like to have a Mutex<Option<RealAllocator>> (or a wrapper that makes it "insert-only") be the only allocator, and have everything allocated early on be parameterized by the early-boot BumpAllocator. This'd also let me ensure that the BumpAllocator doesn't get used after early-boot, since the stuff I allocate couldn't outlive it.

@sfackler

It's not really clear to me why we should have a full complement of _excess APIs in the first place. They originally existed to mirror jemalloc's experimental *allocm API, but those were removed in 4.0 several years ago in favor of not duplicating their entire API surface. It seems like we could follow their lead?

Currently shrink_in_place calls xallocx which returns the real allocation size. Because shrink_in_place_excess does not exist, it throws this size away, and users must call nallocx to recompute it, whose cost really depends on how big the allocation is.

So at least some jemalloc allocation functions that we are already using are returning us the usable size, but the current API does not allow us to use it.

@remexre

When I was working on my toy kernel, avoiding the global allocator to ensure no allocation happened until an allocator was set up was a goal of mine too. Glad to hear I'm not the one one!

I do not like the word Heap for the default global allocator. Why not Default?

Another point of clarification: RFC 1974 puts all this stuff in std::alloc but it's currently in std::heap. Which location is being proposed for stabilization?

@jethrogb "Heap" is a pretty canonical term for "that thing malloc gives you pointers to" - what are your concerns with the term?

@sfackler

"that thing malloc gives you pointers to"

Except in my mind that is what System is.

Ah sure. Global is another name then maybe? Since you use #[global_allocator] to select it.

There can be multiple heap allocators (e.g. libc and prefixed jemalloc). How about renaming std::heap::Heap to std::heap::Default and #[global_allocator] to #[default_allocator]?

The fact that it’s what you get if you don’t specify otherwise (presumably when for example Vec gains an extra type parameter / field for the allocator) is more important than the fact that it doesn’t have "per-instances" state (or instances really).

The final comment period is now complete.

Regarding FCP, I think that the API subset that was proposed for stabilization is of very limited use. For example, it does not support the jemallocator crate.

In what way? jemallocator may have to flag off some of the impls of unstable methods behind a feature flag but that's it.

If jemallocator on stable Rust cannot implement for example Alloc::realloc by calling je_rallocx but needs to rely on the default alloc + copy + dealloc impl, then it’s not an acceptable replacement for the standard library’s alloc_jemalloc crate IMO.

Sure, you could get something to compile, but it’s not a particularly useful thing.

Why? C++ doesn't have any concept of realloc at all in its allocator API and that doesn't appear to have crippled the language. It's obviously not ideal, but I don't understand why it would be unacceptable.

C++ collections generally don’t use realloc because C++ move constructors can run arbitrary code, not becuse realloc is not useful.

And the comparison is not with C++, it’s with the current Rust standard library with built-in jemalloc support. Switching to and out-of-std allocator with only this subset of Alloc API would be a regression.

And realloc is an example. jemallocator currently also implements alloc_zeroed, alloc_excess, usable_size, grow_in_place, etc.

alloc_zeroed is proposed to be stabilized. As far as I can tell (look upthread), there are literally zero uses of alloc_excess in existence. Could you show some code that will regress if that falls back to a default implementation.

More generally, though, I don't see why this is an argument against stabilizing a portion of these APIs. If you don't want to use jemallocator, you can continue not using it.

Could Layout::array<T>() be made a const fn?

It can panic, so not at the moment.

It can panic, so not at the moment.

I see... I'd settle for const fn Layout::array_elem<T>() that would be a non-panicking equivalent of Layout::<T>::repeat(1).0.

@mzabaluev I think what you're describing is equivalent to Layout::new<T>(). It can currently panic, but that's just because it's implemented using Layout::from_size_align and then .unwrap(), and I expect it could be done differently.

@joshlf I think this struct has the size of 5, while as elements of an array these are placed at every 8 bytes due to the alignment:

struct Foo {
    bar: u32,
    baz: u8
}

I'm not sure that an array of Foo would include the padding of the last element for its size calculation, but that's my strong expectation.

In Rust, the size of an object is always a multiple of its alignment so that the address of the nth element of an array is always array_base_pointer + n * size_of<T>(). So the size of an object in an array is always the same as the size of that object on its own. See the Rustonomicon page on repr(Rust) for more details.

OK, it turns out that a struct is padded to its alignment, but AFAIK this is not a stable guarantee except in #[repr(C)].
Anyway, making Layout::new a const fn would be welcome as well.

This is the documented (and so guaranteed) behavior of a stable function:

https://doc.rust-lang.org/std/mem/fn.size_of.html

Returns the size of a type in bytes.

More specifically, this is the offset in bytes between successive elements in an array with that item type including alignment padding. Thus, for any type T and length n, [T; n] has a size of n * size_of::<T>().

Thanks. I just realized that any const fn that multiplies the result of Layout::new would be inherently panicky in turn (unless it's done with saturating_mul or some such), so I'm back to square one. Continuing with a question about panics in the const fn tracking issue.

The panic!() macro is currently not supported in constant expressions, but panics from checked arithmetic are generated by the compiler and not affected by that limitation:

error[E0080]: constant evaluation error
 --> a.rs:1:16
  |
1 | const A: i32 = i32::max_value() * 2;
  |                ^^^^^^^^^^^^^^^^^^^^ attempt to multiply with overflow

error: aborting due to previous error

This is related to Alloc::realloc but not to stabilization of the minimal interface (realloc is not part of it):

Currently, because Vec::reserve/double call RawVec::reserve/double which call Alloc::realloc, the default impl of Alloc::realloc copies dead vector elements (in the [len(), capacity()) range). In the absurd case of a huge empty vector that wants to insert capacity() + 1 elements and thus reallocates, the cost of touching all that memory is not insignificant.

In theory, if the default Alloc::realloc implementation would also take a "bytes_used" range, it could just copy the relevant part on reallocation. In practice, at least jemalloc overrides Alloc::realloc default impl with a call to rallocx. Whether doing an alloc/dealloc dance copying only the relevant memory is faster or slower than a rallocx call will probably depend on many things (does rallocx manage to expand the block in place? how much unnecessary memory will rallocx copy? etc.).

https://github.com/QuiltOS/rust/tree/allocator-error I've started to demonstrate how I think associated error type solves our collections and error-handling problems by doing the generalization itself. In particular, note how in the modules I change that I

  • Always reuse the Result<T, A::Err> implementation for the T implemetation
  • Never unwrap or anything else partial
  • No oom(e) outside of AbortAdapter.

This means that the changes I'm making are quite safe, and quite mindless too! Working with both the error-returning and error-aborting should not require extra effort to maintain mental invariants---the type checker does all the work.

I recall---I think in @Gankro's RFC? or the pre-rfc thread---Gecko/Servo people saying it was nice to not have the fallibility of collections be part of their type. Well, I can add a #[repr(transparent)] to AbortAdapter so that collections can safely be transmuted between Foo<T, A> and Foo<T, AbortAdapter<A>> (within safe wrappers), allowing one to freely switch back and forth without duplicating every method. [For back-compat, the standard library collections will need to be duplicated in any event, but user methods need not be as Result<T, !> is these days quite easy to work with.]

Unfortunately the code won't fully type check because changing the type params of a lang item (box) confuses the compiler (surprise!). But hopefully what I have so far gives a flavor of what I am doing. The ICE-causing box commit is the last one---everything before it is good. @eddyb fixed rustc in #47043!

edit @joshlf I was informed of your https://github.com/rust-lang/rust/pull/45272, and incorporated that in here. Thanks!

Persistent Memory (eg http://pmem.io) is the next big thing, and Rust needs to be positioned to work well with it.

I've been working recently on a Rust wrapper for a persistent memory allocator (specifically, libpmemcto). Whatever decisions are made regarding the stabilisation of this API, it needs to:-

  • Be able to support a performant wrapper around a persistent memory allocator like libpmemcto;
  • Be able to specify (parameterize) collection types by allocator (at the moment, one needs to duplicate Box, Rc, Arc, etc)
  • Be able to clone data across allocators
  • Be able to support having persistent memory stored structs with fields that are re-initialized on instantiation of a persistent memory pool, ie, some persistent memory structs needs to have fields that are only stored temporarily on the heap. My current use cases are a reference to the persistent memory pool used for allocation and transient data used for locks.

As an aside, the pmem.io development (Intel's PMDK) makes heavy use of a modified jemalloc allocator under the covers - so it seems prudent that using jemalloc as an example API consumer would be prudent.

Would it be possible to reduce the scope of this to cover only GlobalAllocators first until we gain more experience with using Allocators in collections?

IIUC this already would serve servo's needs and would allow us to experiment with parametrizing containers in parallel. In the future we can either move collections to use GlobalAllocator instead or just add a blanket impl of Alloc for GlobalAllocator so that these can be used for all collections.

Thoughts?

@gnzlbg For the #[global_allocator] attribute to be useful (beyond selecting heap::System) the Alloc trait needs to be stable too, so that it can be implemented by crates like https://crates.io/crates/jemallocator. There is no type or trait named GlobalAllocator at the moment, are you proposing some new API?

here is no type or trait named GlobalAllocator at the moment, are you proposing some new API?

What I suggested is renaming the "minimal" API that @alexcrichton suggested to stabilize here from Alloc to GlobalAllocator to represent only global allocators, and leaving the door open for collections to be parametrized by a different allocator trait in the future (which does not mean that we can't parametrize them by the GlobalAllocator trait).

IIUC servo currently only needs to be able to switch the global allocator (as opposed to also being able to parametrize some collections by an allocator). So maybe instead of trying to stabilize a solution that should be future proofed for both use-cases, we can address only the global allocator issue now, and figure out how to parametrize collections by allocators later.

Don't know if that makes sense.

IIUC servo currently only needs to be able to switch the global allocator (as opposed to also being able to parametrize some collections by an allocator).

That is correct, but:

  • If a trait and its method are stable so that it can be implemented, then it can also be called directly without going through std::heap::Heap. So it’s not only a trait global allocator, it’s a trait for allocators (even if we end up making a different one for collections generic over allocators) and GlobalAllocator is not a particularly good name.
  • The jemallocator crate currently implements alloc_excess, realloc, realloc_excess, usable_size, grow_in_place, and shrink_in_place which are not part the proposed minimal API. These can be more efficient than the default impl, so removing them would be a performance regression.

Both points make sense. I just thought that the only way to significantly accelerate the stabilization of this feature was to cut out a dependency on it also being a good trait for parametrizing collections over it.

[It would be nice if Servo could be like (stable | official mozilla crate), and cargo could enforce this, to remove a little pressure here.]

@Ericson2314 servo is not the only project that wants to use these APIs.

@Ericson2314 I don’t understand what this means, could you rephrase?

For context: Servo currently uses a number unstable features (including #[global_allocator]), but we’re trying to slowly move away from that (either by updating to a compiler that has stabilized some features, or by finding stable alternatives.) This is tracked at https://github.com/servo/servo/issues/5286. So stabilizing #[global_allocator] would be nice, but it’s not blocking any Servo work.

Firefox relies on the fact that Rust std defaults to the system allocator when compiling a cdylib, and that mozjemalloc which ends up being linked into the same binary defines symbols like malloc and free that "shadow" (I don’t know the proper linker terminology) those from libc. So allocations from Rust code in Firefox ends up using mozjemalloc. (This is on Unix, I don’t know how it works on Windows.) This works out, but it feels fragile to me. Firefox uses stable Rust, and I’d like it to use #[global_allocator] to explicitly select mozjemalloc to make the whole setup is more robust.

@SimonSapin the more that I play with allocators and collections, the more I tend to think that we don't want to parametrize the collections by Alloc yet, because depending on the allocator, a collection might want to offer a different API, the complexity of some operations change, some collection details actually depend on the allocator, etc.

So I would like to suggest a way in which we can make progress here.

Step 1: Heap allocator

We could restrict ourselves at first to try to let users select the allocator for the heap (or the system/platform/global/free-store allocator, or however you prefer to name it) in stable Rust.

The only thing that we initially parametrize by it is Box, which only needs to allocate (new) and deallocate (drop) memory.

This allocator trait could initially have the API that @alexcrichton proposed (or somewhat extended), and this allocator trait could, on nightly, still have a slightly extended API to support the std:: collections.

Once we are there, users that want to migrate to stable will be able to do so, but might get a performance hit, because of the unstable API.

Step 2: Heap allocator without performance hit

At that point, we can re-evaluate the users that can't move to stable because of a performance hit, and decide how to extend this API and stabilize that.

Steps 3 to N: supporting custom allocators in std collections.

First, this is hard, so it might never happen, and I think it never happening isn't a bad thing.

When I want to parametrize a collection with a custom allocator I either have a performance problem or an usability problem.

If I have an usability problem I typically want a different collection API that exploits features of my custom allocator, like for example my SliceDeque crate does. Parametrizing a collection by a custom allocator won't help me here.

If I have a performance problem, it would still be very hard for a custom allocator to help me. I am going to consider Vec in the next sections only, because it is the collection I've reimplemented most often.

Reduce the number of system allocator calls (Small Vector Optimization)

If I want to allocate some elements inside the Vec object to reduce the number of calls to the system allocator, today I just use SmallVec<[T; M]>. However, a SmallVec is not a Vec:

  • moving a Vec is O(1) in the number of elements, but moving a SmallVec<[T; M]> is O(N) for N < M and O(1) afterwards,

  • pointers to the Vec elements are invalidated on move if len() <= M but not otherwise, that is, if len() <= M operations like into_iter need to move the elements into the iterator object itself, instead of just taking pointers.

Could we make Vec generic over an allocator to support this? Everything is possible, but I think that the most important costs are:

  • doing so makes the implementation of Vec more complex, which might impact users not using this feature
  • the documentation of Vec would become more complex, because the behavior of some operations would depend on the allocator.

I think these costs are non-negligible.

Make use of allocation patterns

The growth-factor of a Vec is tailored to a particular allocator. In std we can tailor it to the common ones jemalloc/malloc/... but if you are using a custom allocator, chances are that the growth-factor we choose by default won't be the best for your use case. Should every allocator be able to specify a growth-factor for vec-like allocation patterns? I don't know but my gut feeling tells me: probably not.

Exploit extra features of your system allocator

For example, an over-committing allocator is available in most of the Tier 1 and Tier 2 targets. In Linux-like and Macos systems the heap allocator overcommits by default, while the Windows API exposes VirtualAlloc which can be used to reserve memory (e.g. on Vec::reserve/with_capacity) and commit memory on push.

Currently the Alloc trait doesn't expose a way to implement such an allocator on Windows, because it doesn't separate the concepts of commiting and reserving memory (on Linux a non-over-commiting allocator can be hacked in by just touching each page once). It also doesn't expose a way for an allocator to state whether it over-commits or not by default on alloc.

That is, we would need to extend the Alloc API to support this for Vec, and that would be IMO for little win. Because when you have such an allocator, Vec semantics change again:

  • Vec doesn't need to grow ever again, so operations like reserve make little sense
  • push is O(1) instead of amortized O(1).
  • iterators to live objects are never invalidates as long as the object is alive, which allows some optimizations

Exploit more extra features of your system allocator

Some system alloctors like cudaMalloc/cudaMemcpy/... differentiate between pinned and non-pinned memory, allow you to allocate memory on disjoint address spaces (so we would need an associated Pointer type in the Alloc trait), ...

But using these on collections like Vec does again change the semantics of some operations in subtle ways, like whether indexing a vector suddenly invokes undefined behavior or not, depending on whether you do so from a GPU kernel or from the host.

Wrapping up

I think that trying to come up with an Alloc API that can be used to parametrize all collections (or only even Vec) is hard, probably too hard.

Maybe after we get global/system/platform/heap/free-store allocators right, and Box, we can rethink the collections. Maybe we can reuse Alloc, maybe we need a VecAlloc, VecDequeAlloc,HashMapAlloc`, ... or maybe we just say, "you know what, if you really need this, just copy-paste the standard collection into a crate, and mold it to your allocator". Maybe the best solution is to just make this easier, by having std collections in its own crate (or crates) in the nursery and using only stable features, maybe implemented as a set of building blocks.

Anyways, I think trying to tackle all these problems here at once and trying to come up with an Alloc trait that is good for everything is too hard. We are at step 0. I think that the best way to get to step 1 and step 2 quick is to leave collections out of the picture until we are there.

Once we are there, users that want to migrate to stable will be able to do so, but might get a performance hit, because of the unstable API.

Picking a custom allocator is usually about improving performance, so I don’t know who this initial stabilization would serve.

Picking a custom allocator is usually about improving performance, so I don’t know who this initial stabilization would serve.

Everybody? At least right now. Most Some of the methods you complain are missing in the initial stabilization proposal (alloc_excess, for example), are AFAIK not used by anything in the standard library yet. Or did this change recently?

Vec (and other users of RawVec) use realloc in push

@SimonSapin

The jemallocator crate currently implements alloc_excess, realloc, realloc_excess, usable_size, grow_in_place, and shrink_in_place

From these methods, AFAIK realloc, grow_in_place, and shrink_in_place are used but grow_in_place is only a naive wrapper over shrink_in_place for jemalloc at least so if we implemented the default unstable impl of grow_in_place in terms of shrink_in_place in the Alloc trait, that cuts it down to two methods: realloc and shrink_in_place.

Picking a custom allocator is usually about improving performance,

While this is true, you might get more performance from using a more suited allocator without these methods, than a bad allocator that has them.

IIUC the main use case for servo was to use Firefox jemalloc instead of having a second jemalloc around, was that right?

Even if we add realloc and shrink_in_place to the Alloc trait in an initial stabilization, that would only delay the performance complaints.

For example, the moment we add any unstable API to the Alloc trait that ends up being using by the std collections, you wouldn't be able to get the same performance on stable than you would be able to get on nightly. That is, if we add realloc_excess and shrink_in_place_excess to the alloc trait and make Vec/String/... use them, that we stabilized realloc and shrink_in_place wouldn't have helped you a single bit.

IIUC the main use case for servo was to use Firefox jemalloc instead of having a second jemalloc around, was that right?

Although they share some code, Firefox and Servo are two separate projects/applications.

Firefox use mozjemalloc, which is a fork of an old version of jemalloc with a bunch of features added. I think that some unsafe FFI code relies for correctness and soundness on mozjemalloc being used by Rust std.

Servo uses jemalloc which happens to be Rust’s default for executables at the moment, but there are plans to change that default to the system’s allocator. Servo also has some unsafe memory usage reporting code that relies for soundness on jemalloc being used indeed. (Passing Vec::as_ptr() to je_malloc_usable_size.)

Servo uses jemalloc which happens to be Rust’s default for executables at the moment, but there are plans to change that default to the system’s allocator.

It would be good to know if the system allocators in the systems that servo targets provide optimized realloc and shrink_to_fit APIs like jemalloc does? realloc (and calloc) are very common, but shrink_to_fit (xallocx) is AFAIK specific to jemalloc. Maybe the best solution would be to stabilize realloc and alloc_zeroed (calloc) in the initial implementation, and leave shrink_to_fit for later. That should allow servo to work with system allocators in most platforms without performance issues.

Servo also has some unsafe memory usage reporting code that relies for soundness on jemalloc being used indeed. (Passing Vec::as_ptr() to je_malloc_usable_size.)

As you know, the jemallocator crate has APIs for this. I expect that crates similar to the jemallocator crate will pop up for other allocators offering similar APIs as the global allocator story begins getting stabilized. I haven't thinked about whether these APIs belong in the Alloc trait at all.

I don’t think malloc_usable_size needs to be in the Alloc trait. Using #[global_allocator] to be confident what allocator is used by Vec<T> and separately using a function from the jemallocator crate is fine.

@SimonSapin once the Alloc trait becomes stable, we'll probably have a crate like jemallocator for Linux malloc and Windows. These crates could have an extra feature to implement the parts that they can of the unstable Alloc API (like, e.g., usable_size on top of malloc_usable_size) and some other things that are not part of the Alloc API, like memory reporting on top of mallinfo. Once there are usable crates for the systems that servo targets it would be easier to know which parts of the Alloc trait to prioritize stabilizing, and we'll probably find out newer APIs that should be at least experimented with for some allocators.

@gnzlbg I'm a bit skeptical of the things in https://github.com/rust-lang/rust/issues/32838#issuecomment-358267292. Leaving out all those system-specific stuff, it's not hard to generalize collections for alloc--I've done it. Trying to incorporate that seems like a separate challenge.

@SimonSapin Does firefox have a no unstable Rust policy? I think I was getting confused: Firefox and Servo want this, but if so its Firefox's use-case that would be adding pressure to stabilize.

@sfackler See that ^. I was trying to make a distinction between projects that need vs want this stable, but Servo is on the other side of that divide.

I have a project that wants this and requires it to be stable. There is nothing particularly magical about either Servo or Firefox as consumers of Rust.

@Ericson2314 Correct, Firefox uses stable: https://wiki.mozilla.org/Rust_Update_Policy_for_Firefox. As I explained though there’s working solution today, so this is not a real blocker for anything. It’d be nicer / more robust to use #[global_allocator], that’s all.

Servo does use some unstable features, but as mentioned we’re trying to change that.

FWIW, parametric allocators are very useful to implement allocators. A lot of the less performance-sensitive bookkeeping becomes much easier if you can use various data structures internally and parametrize them by some simpler allocator (like bsalloc). Currently, the only way to do that in a std environment is to have a two-phase compilation in which the first phase is used to set your simpler allocator as the global allocator and the second phase is used to compile the larger, more complicated allocator. In no-std, there's no way to do it at all.

@Ericson2314

Leaving out all those system-specific stuff, it's not hard to generalize collections for alloc--I've done it. Trying to incorporate that seems like a separate challenge.

Do you have an implementation of ArrayVec or SmallVec on top of Vec + custom allocators that I could look at? That was the first point I mentioned, and that's not system specific at all. Arguably that would be the simplest two allocators imaginable, one is just a raw array as storage, and the other one can be built on top of the first one by adding a fallback to the Heap once the array runs out of capacity. The main difference is that those allocators are not "global", but each of the Vecs has its own allocator independent of all others, and these allocators are stateful.

Also, I am not arguing to never do this. I am just stating that this is very hard: C++ has been trying for 30 years with only partial-success: GPU allocators and GC allocators work due to the generic pointer types, but implementing ArrayVec and SmallVec on top of Vec does not result in a zero-cost abstraction in C++ land (P0843r1 discusses some of the issues for ArrayVec in detail).

So I'd just prefer if we would pursue this after we stabilize the pieces that do deliver something useful as long as these don't make pursuing custom collection allocators in the future.


I talked a bit with @SimonSapin on IRC and if we were to extend the initial stabilization proposal with realloc and alloc_zeroed, then Rust in Firefox (which only uses stable Rust) would be able to use mozjemalloc as a global allocator in stable Rust without the need for any extra hacks. As mentioned by @SimonSapin, Firefox currently has a workable solution for this today, so while this would be nice, it doesn't seem to be very high priority.

Still, we could start there, and once we are there, move servo to stable #[global_allocator] without a performance loss.


@joshlf

FWIW, parametric allocators are very useful to implement allocators.

Could you elaborate a bit more on what you mean? Is there a reason why you can't parametrize your custom allocators with the Alloc trait? Or your own custom allocator trait and just implement the Alloc trait on the final allocators (these two traits do not necessarily need to be equal)?

I don't understand where the use case of "SmallVec = Vec + special allocator" comes from. It's not something I've seen mentioned a lot before (neither in Rust nor in other contexts), precisely because it has many serious problems. When I think of "improving performance with a specialized allocator", that's not at all what I think of.

Looking over the Layout API, I was wondering about the differences in error handling between from_size_align and align_to, where the former returns None in case of an error, while the latter panics (!).

Wouldn't it be more helpful and consistent to add a suitably defined and informative LayoutErr enum and return a Result<Layout, LayoutErr> in both cases (and perhaps use it for the other functions that currently return an Option as well)?

@rkruppe

I don't understand where the use case of "SmallVec = Vec + special allocator" comes from. It's not something I've seen mentioned a lot before (neither in Rust nor in other contexts), precisely because it has many serious problems. When I think of "improving performance with a specialized allocator", that's not at all what I think of.

There are two independent ways of using allocators in Rust and C++: the system allocator, used by all allocations by default, and as a type argument for a collection parametrized by some allocator trait, as a way to create an object of that particular collection that uses a particular allocator (which can be the system's allocator or not).

What follows focus only on this second use case: using a collection and an allocator type to create an object of that collection that uses a particular allocator.

In my experience with C++, parametrizing a collection with an allocator serves two use cases:

  • improve performance of a collection object by making the collection use a custom allocator targeted at a specific allocation pattern, and/or
  • add a new feature to a collection allowing it to do something that it couldn't do before.

Adding new features to collections

This is the use case of allocators that I see in C++ code-bases 99% of the time. The fact that adding a new feature to a collection improves performance is, in my opinion, coincidental. In particular, none of the following allocators improves performance by targeting an allocation pattern. They do so by adding features, that in some cases, as @Ericson2314 mentions, can be considered "system-specific". These are some examples:

  • stack allocators for doing small buffer optimizations (see Howard Hinnant's stack_alloc paper). They let you use std::vector or flat_{map,set,multimap,...} and by passing it a custom allocator you add in a small buffer optimization with (SmallVec) or without (ArrayVec) heap fall back. This allows, for example, putting a collection with its elements on the stack or static memory (where it would have otherwise used the heap).

  • segmented memory architectures (like 16-bit wide pointer x86 targets and GPGPUs). For example, C++17 Parallel STL was, during C++14, the Parallel Technical Specification. It's precursor library of the same author is NVIDIA's Thrust library, which includes allocators to allow container clases to use GPGPU memory (e.g. thrust::device_malloc_allocator) or pinned memory (e.g. thrust::pinned_allocator; pinned memory allows faster transfer between host-device in some cases).

  • allocators to solve parallelism-related issues, like false sharing (e.g. Intel Thread Building Blocks cache_aligned_allocator) or over-alignment requirements of SIMD types (e.g. Eigen3's aligned_allocator).

  • interprocess shared memory: Boost.Interprocess has allocators that allocate the collection's memory using OS interprocess shared memory facilities (e.g. like System V shared memory). This allows to directly use a std container to manage memory used to communicate between different processes.

  • garbage collection: Herb Sutter's deferred memory allocation library uses a user-defined pointer type to implement allocators that garbage collect memory. So that, for example, when a vector grows, the old chunk of memory is maintained alive till all pointers to that memory have been destroyed, avoiding iterator invalidation.

  • instrumented allocators: Bloomberg's Software Library's blsma_testallocator allows you to log memory alloctation/deallocation (and C++-specific object construction/destruction) patterns of the objects where you use it. You don't know if a Vec allocates after reserve ? Plug in such an allocator, and they will tell you if it happens. Some of these allocators let you name them, so that you can use it on multiple objects and get logs saying which object is doing what.

These are the types of allocators that I see most often in the wild in C++. As I mentioned before, the fact that they improve performance in some cases, is, in my opinion, coincidental. The important part is that none of them tries to target a particular allocation pattern.

Targeting allocation patterns to improve performance.

AFAIK there aren't any widely used C++ allocators that do this and I'll explain why I think this is in a second. The following libraries target this use case:

However, these libraries do not really provide a single allocator for a particular use case. Instead, they provide allocator building blocks that you can use to build custom allocators targeted at the particular allocation pattern in a particular part of an application.

The general advice that I recall from my C++ days was to just "don't use them" (they are the very last resort) because:

  • matching the system allocator's performance is very hard, beating it is very very hard,
  • the chances of someone else's application memory allocation pattern matching yours is slim, so you really need to know your allocation pattern and know what allocator building blocks you need to match it
  • they are not portable because different vendors have different C++ standard library implementations which do use different allocation patterns; vendors typically target their implementation at their system allocators. That is, a solution tailored to one vendor might perform horribly (worse than the system allocator) in another.
  • there are many alternatives one can exhaust before trying to use these: using a different collection, reserving memory, ... Most of the alternatives are lower effort and can deliver larger wins.

This does not mean that libraries for this use case aren't useful. They are, which is why libraries like foonathan/memory are blooming. But at least in my experience they are way less used in the wild than allocators that "add extra features" because to deliver a win you must beat the system allocator, which is requires more time than most users are willing to invest (Stackoverflow is full of questions of the type "I used Boost.Pool and my performance got worse, what can I do? Not use Boost.Pool.").

Wrapping up

IMO I think it is great that the C++ allocator model, though far from perfect, supports both use cases, and I think that if Rust's std collections are to be parametrized by allocators, they should support both use cases as well, because at least in C++ allocators for both cases have turned out to be useful.

I just think this problem is slightly orthogonal to being able to customize the global/system/platform/default/heap/free-store allocator of a particular application, and that trying to solve both problems at the same time might delay the solution for one of them unnecessarily.

What some users want to do with a collection parametrized by an allocator might be way different from what some other users want to do. If @rkruppe starts from "matching allocation patterns" and I start from "preventing false sharing" or "using a small buffer optimization with heap fallback" it's just going to be hard to first, understand the needs of each other, and second, arrive at a solution that works for both.

@gnzlbg Thanks for the comprehsive write-up. Most of it doesn't address my original question and I disagree with some of it, but it's good to have it spelled out so we don't talk past each other.

My question was specifically about this application:

stack allocators for doing small buffer optimizations (see Howard Hinnant's stack_alloc paper). They let you use std::vector or flat_{map,set,multimap,...} and by passing it a custom allocator you add in a small buffer optimization with (SmallVec) or without (ArrayVec) heap fall back. This allows, for example, putting a collection with its elements on the stack or static memory (where it would have otherwise used the heap).

Reading about stack_alloc, I realize now how that can work. It's not what people usually mean by SmallVec (where the buffer is stored inline in the collection), which is why I missed that option, but it side steps the problem of having to update pointers when the collection moves (and also makes those moves cheaper). Also note that short_alloc allows multiple collections to share one arena, which makes it even more unlike typical SmallVec types. It's more like a linear/bump-pointer allocator with graceful fallback to heap allocation when running of out alotted space.

I disagree that this sort of allocator and cache_aligned_allocator are fundamentally adding new features. They are used differently, and depending on your definition of "allocation pattern" they may not optimize for a specific allocation pattern. However, they certainly optimize for specific use cases and they don't have any significant behavioral differences from general-purpose heap allocators.

I do however agree that use cases like Sutter's deferred memory allocation, which substantially change what a "pointer" even means, are a separate application that may need a separate design if we want to support it at all.

Reading about stack_alloc, I realize now how that can work. It's not what people usually mean by SmallVec (where the buffer is stored inline in the collection), which is why I missed that option, but it side steps the problem of having to update pointers when the collection moves (and also makes those moves cheaper).

I mentioned stack_alloc because it is the only such allocator with "a paper", but it was released in 2009 and precedes C++11 (C++03 did not support stateful allocators in collections).

The way this work in C++11 (which supports stateful allocators), in a nutshell, is:

  • std::vector stores an Allocator object inside it just like Rust RawVec does.
  • the Allocator interface has an obscure property called Allocator::propagate_on_container_move_assignment (POCMA from now on) that user-defined allocators can customize; this property is true by default. If this property is false, on move assignment, the allocator cannot be propagated, so a collection is required by the standard to move each of its elements to the new storage manually.

So when a vector with the system allocator is moved, first the storage for the new vector on the stack is allocated, then the allocator is moved (which is zero-sized), and then the 3 pointers are moved, which are still valid. Such moves are O(1).

OTOHO, when a vector with a POCMA == true allocator is moved, first the storage for the new vector on the stack is allocated and initialized with an empty vector, then the old collection is drained into the new one, so that the old one is empty, and the new one full. This moves each element of the collection individually, using their move assignment operators. This step is O(N) and fixes internal pointers of the elements. Finally, the original now empty collection is dropped. Note that this looks like a clone, but isn't because the elements themselves aren't cloned, but moved in C++.

Does that make sense?

The main problem with this approach in C++ are that:

  • the vector growth-policy is implementation defined
  • the allocator API does not have _excess methods
  • the combination of the two issues above means that if you know your vector can at most hold 9 elements, you can't have a stack allocator that can hold 9 elements, because your vector might try to grow when it has 8 with a growth factor of 1.5 so you need to pessimize and allocate space for 18 elements.
  • the complexity of the vector operation changes depending on allocator properties (POCMA is just one of many properties that the C++ Allocator API has; writing C++ allocators is non-trivial). This makes specifying the API of vector a pain because sometimes copying or moving elements between different allocators of the same type has extra costs, which change the complexity of the operations. It also makes reading the spec a huge pain. Many online sources of documentation like cppreference put the general case upfront, and the obscure details of what changes if one allocator property is true or false in tiny small letters to avoid bothering 99% of the users with them.

There are many people working on improving C++'s allocator API to fix these issues, for example, by adding _excess methods and guaranteeing that standard conforming collections use them.

I disagree that this sort of allocator and cache_aligned_allocator are fundamentally adding new features.

Maybe what I meant is that they allow you to use std collections in situations or for types for which you couldn't use them before. For example, in C++ you can't put the elements of a vector in the static memory segment of your binary without something like a stack allocator (yet you can write your own collection that does it). OTOH, the C++ standard does not support over-aligned types like SIMD types, and if you try to heap allocate one with new you will invoke undefined behavior (you need to use posix_memalign or similar). Using the object typically manifests the undefined behavior via a segfault (*). Things like aligned_allocator allow you to heap allocate these types, and even put them in std collections, without invoking undefined behavior, by using a different allocator. Sure the new allocator will have different allocation patterns (these allocators basically overalign all memory btw...), but what people use them for is to be able to do something that they couldn't do before.

Obviously, Rust is not C++. And C++ has problems that Rust doesn't have (and vice-versa). An allocator that adds a new feature in C++ might be unnecessary in Rust, which, for example, doesn't have any problems with SIMD types.

(*) Users of Eigen3 suffer this deeply, because to avoid undefined behavior when using C++ and STL containers, you need to guard the containers against SIMD types, or types containing SIMD types (Eigen3 docs) and also you need to guard your self from ever using new on your types by overloading operator new for them (more Eigen3 docs).

@gnzlbg thanks, I was also confused by the smallvec exmaple. That would require non-moveable types and some sort of alloca in Rust---two RFCs in review and then more follow-up work---so I have no qualms punting on that for now. The existing smallvec strategy of always using all the stack space you'll need seems fine for now.

I also agree with @rkruppe that in your revised list, the new capabilities of the allocator need not be known to the collection using the allocator. Sometimes the full Collection<Allocator> has new properties (existing entirely in pinned memory, lets say) but that's just a natural consequence of using the allocator.

The one exception here I see is allocators that only allocate a single size/type (the NVidia ones do this, as do slab allocators). We could have a separate ObjAlloc<T> trait that's blanket implemented for normal allocators: impl<A: Alloc, T> ObjAlloc<T> for A. Then, collections would use ObjAlloc bounds if they just needed to allocate a few items. But, I feel somewhat silly even bringing this up as it should be doable backwards compatibly later.

Does that make sense?

Sure but it's not really relevant to Rust since we have no move constructors. So a (movable) allocator that directly contains the memory it hands out pointers to just isn't possible, period.

For example, in C++ you can't put the elements of a vector in the static memory segment of your binary without something like a stack allocator (yet you can write your own collection that does it).

This is not a behavioral change. There are many valid reasons to control where collections get their memory, but all of them related to "externalities" like performance, linker scripts, control over the whole program's memory layout, etc.

Things like aligned_allocator allow you to heap allocate these types, and even put them in std collections, without invoking undefined behavior, by using a different allocator.

This is why I specifically mentioned TBB's cache_aligned_allocator and not Eigen's aligned_allocator. cache_aligned_allocator does not seem to guarantee any specific alignment in its documentation (it just says that it's "typically" 128 byte), and even if it did it usually wouldn't be used for this purpose (because its alignment is probably too large for common SIMD types and too small for things like page-aligned DMA). Its purpose, as you state, is to avoid false sharing.

@gnzlbg

FWIW, parametric allocators are very useful to implement allocators.

Could you elaborate a bit more on what you mean? Is there a reason why you can't parametrize your custom allocators with the Alloc trait? Or your own custom allocator trait and just implement the Alloc trait on the final allocators (these two traits do not necessarily need to be equal)?

I think I wasn't clear; let me try to explain better. Let's say I'm implementing an allocator that I expect to use either:

  • As the global allocator
  • In a no-std environment

And let's say that I'd like to use Vec under the hood to implement this allocator. I can't just use Vec directly as it exists today because

  • If I'm the global allocator, then using it will just introduce a recursive dependency on myself
  • If I'm in a no-std environment, there is no Vec as it exists today

Thus, what I need is to be able to use a Vec that is parametrized on another allocator that I use internally for simple internal bookkeeping. This is the goal of bsalloc (and the source of the name - it is used to bootstrap other allocators).

In elfmalloc, we are still able to be a global allocator by:

  • When compiling ourselves, statically compile jemalloc as the global allocator
  • Produce a shared object file that can be dynamically loaded by other programs

Note that in this case, it's important that we don't compile ourselves using the system allocator as the global allocator because then, once loaded, we'd re-introduce the recursive dependency because, at that point, we are the system allocator.

But it doesn't work when:

  • Somebody wants to use us as the global allocator in Rust in the "official" way (as opposed to by creating a shared object file first)
  • We're in a no-std environment

OTOH, the C++ standard does not support over-aligned types like SIMD types, and if you try to heap allocate one with new you will invoke undefined behavior (you need to use posix_memalign or similar).

Since our current Alloc trait takes alignment as a parameter, I assume this class of problem (the "I can't work without a different alignment" problem) goes away for us?

@gnzlbg - a comprehensive write-up (thank you) but none of the use cases cover persistent memory*.

This use case must be considered. In particular, it strongly influences what the right thing to do is:-

  • That more than one allocator is in use, and especially, when used that allocator is for persistent memory, it would never be the system allocator; (indeed, there could be multiple persistent memory allocators)
  • The cost of 're-implementing' the standard collections is high, and leads to incompatible code with third-party libraries.
  • The allocator's lifetime is not necessarily 'static.
  • That objects stored in persistent memory need additional state that must be populated from the heap, ie they need state reininitialized. This is particularly so for mutexes and the like. What was once disposable no longer is disposed.

Rust has a superb opportunity to seize the initiative here, and make it a first-class platform for what will replace HDs, SSDs and even PCI-attached storage.

*Not a surprise, really, because until very recently it's been a bit special. It's now widely supported in Linux, FreeBSD and Windows.

@raphaelcohn

This really isn't the place to work out persistent memory. Yours is not the only school of thought regarding the interface to persistent memory- for instance, it may turn out that the prevailing approach is simply to treat it like faster disk, for data integrity reasons.

If you have a use case for using persistent memory this way it would probably be better to make that case elsewhere somehow first. Prototype it, come up with some more concrete changes to the allocator interface, and ideally make the case that those changes are worth the impact they would have on the average case.

@rpjohnst

I disagree. This is exactly the sort of place it belongs. I want to avoid a decision being made that creates a design that is the result of too narrow a focus and search for evidence.

The current Intel PMDK - which is where a lot of effort for low-level user space support is focused - approaches it far more as allocated, regular memory with pointers - memory that is similar to that via mmap, say. Indeed, if one wants to work with persistent memory on Linux, then, I believe it's pretty much your only port of call at the moment. In essence, one of the most advanced toolkits for using it - the prevailing one if you will - treats it as allocated memory.

As for prototyping it - well, that's exactly what I said I have done:-

I've been working recently on a Rust wrapper for a persistent memory allocator (specifically, libpmemcto).

(You can use an early-days version of my crate at https://crates.io/crates/nvml. There's a lot more experimentation in source control in the cto_pool module).

My prototype is built in mind with what is needed to replace a data storage engine in a real-world, large scale system. A similar mindset is behind many of my open source projects. I've found over many years the best libraries, as are the best standards, ones which derive _from_ real usage.

Nothing like trying to fit a real-world allocator to the current interface. Frankly, the experience of using the Alloc interface, then copying the whole of Vec, then tweaking it, was painful. A lot of places assume that allocators aren't passed in, eg Vec::new().

In doing it, I made some observations in my original comment about what would be required of an allocator, and what would be required of an user of such an allocator. I think those are very valid on a discussion thread about an allocator interface.

The good news is your first 3 bullet points from https://github.com/rust-lang/rust/issues/32838#issuecomment-358940992 are shared by other use-cases.

I just wanted to add that i did not add non-volatile memory to the list
because the list listed use cases of allocators parametrizing containers in
the C++ world that are “widely” used, at least on my experience (those
alloctors I mentioned are mostly from very popular libraries used by many).
While I know of the efforts of the Intel SDK (some of their libraries
target C++) I don’t personally know any projects using them (do they have
an allocator that can be used with std::vector? I don’t know). This doesn’t
mean that they aren’t used nor important. I’d be interested into knowing
about these, but the main point of my post was that parametrizing
allocators by containers is very complex, and we should try to make
progress with system allocators without closing any doors for containers
(but we should tackle that later).

On Sun 21. Jan 2018 at 17:36, John Ericson notifications@github.com wrote:

The good news is your first 3 bullet points from #32838 (comment)
https://github.com/rust-lang/rust/issues/32838#issuecomment-358940992
are shared by other use-cases.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32838#issuecomment-359261305,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA3Npk95PZBZcm7tknNp_Cqrs_3T1UkEks5tM2ekgaJpZM4IDYUN
.

I tried to read most of what has been written already so this may be here already and in that case I'm sorry if I missed it but here goes:

Something that is fairly common for games (in C/C++) is is to use "per frame scratch allocation" What this means is there is an linear/bump allocator that is used for allocations that are alive for a certain period of time (in a game frame) and then "destroyed".

Destroyed in this case meaning that you reset the allocator back to it's starting position. There is no "destruction" of objects at all as these objects has to be of POD type (thus no destructors are being executed)

I wonder if something like this will fit with the current allocator design in Rust?

(edit: It should be there is NO destruction of objects)

@emoon

Something that is fairly common for games (in C/C++) is is to use "per frame scratch allocation" What this means is there is an linear/bump allocator that is used for allocations that are alive for a certain period of time (in a game frame) and then "destroyed".

Destroyed in this case meaning that you reset the allocator back to it's starting position. There is "destruction" of objects at all as these objects has to be of POD type (thus no destructors are being executed)

Should be doable. Off the top of my head, you'd need one object for the arena itself and another object that is a per-frame handle on the arena. Then, you could implement Alloc for that handle, and assuming you were using high-level safe wrappers for allocation (e.g., imagine that Box becomes parametric on Alloc), the lifetimes would ensure that all of the allocated objects were dropped before the per-frame handle was dropped. Note that dealloc would still be called for each object, but if dealloc was a no-op, then the entire drop-and-deallocate logic might be completely or mostly optimized away.

You could also use a custom smart pointer type that doesn't implement Drop, which would make a lot of things easier elsewhere.

Thanks! I made a typo in my original post. It's to say that there is no destruction of objects.

For people who aren't expert in allocators, and can't follow this thread, what's the current consensus: do we plan to support custom allocators for the stdlib collection types?

@alexreg I'm not sure what the final plan is, but there is confirmed 0 technical difficulties in doing so. OTOH we don't have a good way to then expose that in std because default type variables are suspect, but I have no problem with just making it an alloc-only thing for now so we can make progress on the lib side unimpeded.

@Ericson2314 Okay, good to hear. Are default type variables implemented yet? Or at RFC stage perhaps? As you say, if they're just restricted to things related to alloc / std::heap, it should all be okay.

I really think AllocErr should be Error. It would be more consistent with another modules (e.g. io).

impl Error for AllocError probably makes sense and doesn’t hurt, but I’ve personally found the Error trait to be useless.

I was looking at at the Layout::from_size_align function today, and the "align must not exceed 2^31 (i.e. 1 << 31)," limitation did not make sense to me. And git blame pointed to #30170.

I must say that was a quite deceptive commit message there, talking about align fitting in a u32, which is only incidental, when the actual thing being "fixed" (more worked around) is a system allocator misbehaving.

Which leads me to this note: The "OSX/alloc_system is buggy on huge alignments" item here should not be checked. While the direct issue has been dealt with, I don't think the fix is right for the long term: Because a system allocator misbehaves shouldn't prevent implementing an allocator that behaves. And the arbitrary limitation on Layout::from_size_align does that.

@glandium Is it useful to request alignment to a multiple of 4 gigbytes or more?

I can imagine cases where one may want to have an allocation of 4GiB aligned at 4GiB, which is not possible currently, but hardly more. But I don't think arbitrary limitations should be added just because we don't think of such reasons now.

I can imagine cases where one may want to have an allocation of 4GiB aligned at 4GiB

What are those cases?

I can imagine cases where one may want to have an allocation of 4GiB aligned at 4GiB

What are those cases?

Concretely, I just added support for arbitrarily large alignments in mmap-alloc in order to support allocating large, aligned slabs of memory for use in elfmalloc. The idea is to have the slab of memory be aligned to its size so that, given a pointer to an object allocated from that slab, you merely need to mask off the low bits to find the containing slab. We don't currently use slabs that are 4GB in size (for objects that large, we go directly to mmap), but there's no reason that we couldn't, and I could totally imagine an application with large RAM requirements that wanted to do that (that is, if it allocated multi-GB objects frequently enough that it didn't want to accept the overhead of mmap).

Here's a possible use case for > 4GiB alignment: alignment to a large page boundary. There already are platforms that support > 4 GiB pages. This IBM document say "the POWER5+ processor supports four virtual memory page sizes: 4 KB, 64 KB, 16 MB, and 16 GB." Even x86-64 isn't far off: "huge pages" typically are 2 MiB, but it also supports 1 GiB.

All the non-typed functions in the Alloc trait are dealing with *mut u8. Which means they could take or return null pointers, and all hell would break loose. Should they use NonNull instead?

There are many pointers that they could return from which all hell would
break loose.
On Sun, Mar 4, 2018 at 3:56 AM Mike Hommey notifications@github.com wrote:

All the non-typed functions in the Alloc trait are dealing with *mut u8.
Which means they could take or return null pointers, and all hell would
break loose. Should they use NonNull instead?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32838#issuecomment-370223269,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABY2UR2dRxDtdACeRUh_djM-DExRuLxiks5ta9aFgaJpZM4IDYUN
.

A more compelling reason to use NonNull is that it would allow the Results currently returned from Alloc methods (or Options, if we switch to that in the future) to be smaller.

A more compelling reason to use NonNull is that it would allow the Results currently returned from Alloc methods (or Options, if we switch to that in the future) to be smaller.

I don't think it would because AllocErr has two variants.

There are many pointers that they could return from which all hell would break loose.

But a null pointer is clearly more wrong than any other pointer.

I like to think that the rust type system helps with footguns, and is used to encode invariants. The documentation for alloc clearly says "If this method returns an Ok(addr), then the addr returned will be non-null address", but its return type doesn't. As things are, Ok(malloc(layout.size())) would be a valid implementation, when it clearly isn't.

Note, there are also notes about Layout size needing to be non-zero, so I'd also argue it should encode that as a NonZero.

It's not because all those functions are inherently unsafe that we shouldn't have some footgun prevention.

Out of all the possible errors in using (edit: and implementing) allocators, passing a null pointer is one of the easiest to track down (you always get a clean segfault on dereference, at least if you have an MMU and didn't do very weird things with it), and usually one of the most trivial ones to fix as well. It's true that unsafe interfaces can try to prevent footguns, but this footgun seems disproportionally small (compared to the other possible errors, and to the verbosity of encoding this invariant in the type system).

Besides, it seems likely that allocator implementations would just use the unchecked constructor of NonNull "for performance": since in a correct allocator would never return null anyway, it would want to skip the NonNell::new(...).unwrap(). In that case you won't actually get any tangible footgun prevention, just more boilerplate. (The Result size benefits, if real, may still be a compelling reason for it.)

allocator implementations would just use the unchecked constructor of NonNull

The point is less to help allocator implementation than to help their users. If MyVec contains a NonNull<T> and Heap.alloc() already returns a NonNull, that one less checked or unsafe-unchecked call that I need to make.

Note that pointers are not only return types, they are also input types to e.g. dealloc and realloc. Are those functions supposed to harden against their input being possibly null or not? The documentation would tend to say no, but the type system would tend to say yes.

Quite similarly with layout.size(). Are allocation functions supposed to handle the requested size being 0 somehow, or not?

(The Result size benefits, if real, may still be a compelling reason for it.)

I doubt there are size benefits, but with something like #48741, there would be codegen benefits.

If we continue that principle of being more flexible for users of the API, pointers should be NonNull in return types but not in arguments. (This doesn’t mean that those arguments should be null-checked at runtime.)

I think a Postel's law approach is the wrong one to take here. Is there any
case in which passing a null pointer to an Alloc method is valid? If not,
then that flexibility is basically just giving the footgun a slightly more
sensitive trigger.

On Mar 5, 2018 8:00 AM, "Simon Sapin" notifications@github.com wrote:

If we continue that principle of being more flexible for users of the API,
pointers should be NonNull in return types but not in arguments. (This
doesn’t mean that those arguments should be null-checked at runtime.)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32838#issuecomment-370327018,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA_2L8zrOLyUv5mUc_kiiXOAn1f60k9Uks5tbOJ0gaJpZM4IDYUN
.

The point is less to help allocator implementation than to help their users. If MyVec contains a NonNull and Heap.alloc() already returns a NonNull, that one less checked or unsafe-unchecked call that I need to make.

Ah this makes sense. Doesn't fix the footgun, but centralizes the responsibility for it.

Note that pointers are not only return types, they are also input types to e.g. dealloc and realloc. Are those functions supposed to harden against their input being possibly null or not? The documentation would tend to say no, but the type system would tend to say yes.

Is there any case in which passing a null pointer to an Alloc method is valid? If not, then that flexibility is basically just giving the footgun a slightly more sensitive trigger.

The user absolutely has to read the documentation and keep the invariants in mind. Many invariants can't be enforced via type system at all -- if they could, the function wouldn't be unsafe to begin with. So this is solely a question of whether putting NonNull in any given interface will actually help users by

  • reminding them to read the docs and think about the invariants
  • offering convenience (@SimonSapin's point wrt alloc's return value)
  • giving some material advantage (e.g., layout optimizations)

I don't see any strong advantage in making e.g., the argument of dealloc into NonNull. I see roughly two classes of uses of this API:

  1. Relatively trivial use, where you call alloc, store the returned pointer somewhere, and after a while pass the stored pointer to dealloc.
  2. Complicated scenarios involving FFI, lots of pointer arithmetic, etc. where there's significant logic involved in ensuring you pass the right thing to dealloc at the end.

Taking NonNull here basically only helps the first kind of use case, because those will store the NonNull in some nice place and just pass it to NonNull unaltered. Theoretically it could prevent some typos (passing foo when you meant bar) if you're juggling multiple pointers and only one of them is NonNull, but this doesn't seem too common or important. The disadvantage of dealloc taking a raw pointer (assuming alloc returns NonNull which @SimonSapin has convinced me should happen) would be that it requires an as_ptr in the dealloc call, which is potentially annoying but doesn't impact safety either way.

The second kind of use case is not helped because it likely can't keep using NonNull throughout the whole process, so it would have to manually re-create a NonNull from the raw pointer it got by whatever means. As I argued earlier, this would likely become an unchecked/unsafe assertion rather than an actual run time check, so no footguns are prevented.

This is not to say I am in favor of dealloc taking a raw pointer. I just don't see any the claimed advantages wrt footguns. Consistency of the types probably just wins by default.

I'm sorry but I read this as "Many invariants can't be enforced via type system at all...therefore let's not even try". Don't let the perfect be the enemy of the good!

I think it's more about the tradeoffs between the guarantees provided by NonNull and the ergonomics lost from having to transition back and forth between NonNull and raw pointers. I don't have a particularly strong opinion either way-- neither side seems unreasonable.

@cramertj Yeah, but I don't really buy the premise of that sort of argument. People say Alloc is for obscure, hidden, and largely unsafe use-cases. Well, in obscure, hard-to-read code, I would like to have as much safety as possible---precisely because they are so rarely touched its likely the original author won't be around. Conversely, if the code is being read years later, screw egonomics. If anything, it is counter-productive. The code should strive to be very explicit so an unfamiliar reader can better figure out what on earth is going on. Less noise < clearer invariants.

The second kind of use case is not helped because it likely can't keep using NonNull throughout the whole process, so it would have to manually re-create a NonNull from the raw pointer it got by whatever means.

This is simply a coordination failure, not a technical inevitability. Sure, right now many unsafe APIs might use raw pointers. So something has to lead the way switching to a superior interface using NonNull or other wrappers. Then other code can more easily follow suit. I see 0 reason to constantly fall back on hard-to-read, uninformative raw-pointers in greenfield, all-Rust, unsafe code.

Hi!

I just want to say that, as the author/maintainer of a Rust custom allocator, I am in favor of NonNull. Pretty much for all the reasons that have already been laid out in this thread.

Also, I'd like to point out that @glandium is the maintainer of firefox's fork of jemalloc, and has lots of experience hacking on allocators as well.

I could write much much more responding to @Ericson2314 and others, but it's quickly becoming a very detached and philosophical debate, so I'm cutting it short here. I was arguing against what I believe is an over-statement of the safety benefits of NonNull in this sort of API (there are other benefits, of course). That is not to say there are no safety benefits, but as @cramertj said, there are trade offs and I think the "pro" side is overstated. Regardless, I have already said I lean towards using NonNull in various places for other reasons -- for the reason @SimonSapin gave in alloc, in dealloc for consistency. So let's do that and not go off on any more tangents.

If there's a few NonNull use-cases that everyone is on board with, that's a great start.

We'll probably want to update Unique and friends to use NonNull instead of NonZero to keep friction at least within liballoc low. But this does look like something we are already doing anyways at one abstraction level over the allocator, and I can't think of any reasons for not doing this at the allocator level as well. Looks like a reasonable change to me.

(There’s already two implementations of the From trait that convert safely between Unique<T> and NonNull<T>.)

Considering I need something very much like the allocator API in stable rust, I extracted the code from the rust repo and put it in a separate crate:

This /could/ be used to iterate on experimental changes to the API, but as of now, it's a plain copy of what is in the rust repo.

[Off topic] Yeah it would be nice if std could use out of tree stable code crates like that, so that we can experiment with unstable interfaces in stable code. This is one of the reasons I like having std a facade.

std could depend on a copy of a crate from crates.io, but if your program also depends on the same crate it wouldn’t "look like" the same crate/types/traits to rustc anyway so I don’t see how it would help. Anyway, regardless of the facade, making unstable features unavailable on the stable channel is a very deliberate choice, not an accident.

It seems we have some agreement wrt using NonNull. What's the way forward for this to actually happen? just a PR doing it? a RFC?

Unrelatedly, I've been looking at some assembly generated from Boxing things, and the error paths are rather large. Should something be done about it?

Examples:

pub fn bar() -> Box<[u8]> {
    vec![0; 42].into_boxed_slice()
}

pub fn qux() -> Box<[u8]> {
    Box::new([0; 42])
}

compiles to:

example::bar:
  sub rsp, 56
  lea rdx, [rsp + 8]
  mov edi, 42
  mov esi, 1
  call __rust_alloc_zeroed@PLT
  test rax, rax
  je .LBB1_1
  mov edx, 42
  add rsp, 56
  ret
.LBB1_1:
  mov rax, qword ptr [rsp + 8]
  movups xmm0, xmmword ptr [rsp + 16]
  movaps xmmword ptr [rsp + 32], xmm0
  mov qword ptr [rsp + 8], rax
  movaps xmm0, xmmword ptr [rsp + 32]
  movups xmmword ptr [rsp + 16], xmm0
  lea rdi, [rsp + 8]
  call __rust_oom@PLT
  ud2

example::qux:
  sub rsp, 104
  xorps xmm0, xmm0
  movups xmmword ptr [rsp + 58], xmm0
  movaps xmmword ptr [rsp + 48], xmm0
  movaps xmmword ptr [rsp + 32], xmm0
  lea rdx, [rsp + 8]
  mov edi, 42
  mov esi, 1
  call __rust_alloc@PLT
  test rax, rax
  je .LBB2_1
  movups xmm0, xmmword ptr [rsp + 58]
  movups xmmword ptr [rax + 26], xmm0
  movaps xmm0, xmmword ptr [rsp + 32]
  movaps xmm1, xmmword ptr [rsp + 48]
  movups xmmword ptr [rax + 16], xmm1
  movups xmmword ptr [rax], xmm0
  mov edx, 42
  add rsp, 104
  ret
.LBB2_1:
  movups xmm0, xmmword ptr [rsp + 16]
  movaps xmmword ptr [rsp + 80], xmm0
  movaps xmm0, xmmword ptr [rsp + 80]
  movups xmmword ptr [rsp + 16], xmm0
  lea rdi, [rsp + 8]
  call __rust_oom@PLT
  ud2

That's a rather large amount of code to add to any place creating boxes. Compare with 1.19, which didn't have the allocator api:

example::bar:
  push rax
  mov edi, 42
  mov esi, 1
  call __rust_allocate_zeroed@PLT
  test rax, rax
  je .LBB1_2
  mov edx, 42
  pop rcx
  ret
.LBB1_2:
  call alloc::oom::oom@PLT

example::qux:
  sub rsp, 56
  xorps xmm0, xmm0
  movups xmmword ptr [rsp + 26], xmm0
  movaps xmmword ptr [rsp + 16], xmm0
  movaps xmmword ptr [rsp], xmm0
  mov edi, 42
  mov esi, 1
  call __rust_allocate@PLT
  test rax, rax
  je .LBB2_2
  movups xmm0, xmmword ptr [rsp + 26]
  movups xmmword ptr [rax + 26], xmm0
  movaps xmm0, xmmword ptr [rsp]
  movaps xmm1, xmmword ptr [rsp + 16]
  movups xmmword ptr [rax + 16], xmm1
  movups xmmword ptr [rax], xmm0
  mov edx, 42
  add rsp, 56
  ret
.LBB2_2:
  call alloc::oom::oom@PLT

If this is actually significant, then it's indeed annoying. However maybe LLVM optimizes this out for larger programs?

There are 1439 calls to __rust_oom in latest Firefox nightly. Firefox doesn't use rust's allocator, though, so we get direct calls to malloc/calloc, followed by a null check that the jumps to the oom preparation code, which is usually two movq and a lea, filling the AllocErr and getting its address to pass it to __rust__oom. That's the best case scenario, essentially, but that's still 20 bytes of machine code for the two movq and the lea.

It I look at ripgrep, there are 85, and they are all in identical _ZN61_$LT$alloc..heap..Heap$u20$as$u20$alloc..allocator..Alloc$GT$3oom17h53c76bda5 0c6b65aE.llvm.nnnnnnnnnnnnnnn functions. All of them are 16 bytes long. There are 685 calls to those wrapper functions, most of which are preceded by code similar to what I pasted in https://github.com/rust-lang/rust/issues/32838#issuecomment-377097485 .

@nox was looking today at enabling the mergefunc llvm pass, I wonder if that makes any difference here.

mergefunc apparently doesn't get rid of the multiple identical _ZN61_$LT$alloc..heap..Heap$u20$as$u20$alloc..allocator..Alloc$GT$3oom17h53c76bda5 0c6b65aE.llvm.nnnnnnnnnnnnnnn functions (tried with -C passes=mergefunc in RUSTFLAGS).

But what makes a big difference is LTO, which is actualy what makes Firefox call malloc directly, leaving the creation of the AllocErr to right before calling __rust_oom. That also makes the creation of the Layout unnecessary before calling the allocator, leaving it to when filling the AllocErr.

This makes me think the allocation functions, except __rust_oom should probably be marked inline.

BTW, having looked at the generated code for Firefox, I'm thinking it would ideally be desirable to use moz_xmalloc instead of malloc. This is not possible without a combination of the Allocator traits and being able to replace the global heap allocator, but brings the possible need for a custom error type for the Allocator trait: moz_xmalloc is infallible and never returns in case of failure. IOW, it handles OOM itself, and the rust code wouldn't need to call __rust_oom in that case. Which would make it desirable for the allocator functions to optionally return ! instead of AllocErr.

We’ve discussed making AllocErr a zero-size struct, which might also help here. With the pointer also made NonNull, the entire return value could be pointer-sized.

https://github.com/rust-lang/rust/pull/49669 makes a number of changes to these APIs, with the goal of stabilizing a subset covering global allocators. Tracking issue for that subset: https://github.com/rust-lang/rust/issues/49668. In particular, a new GlobalAlloc trait is introduced.

Will this PR allow us to do things like Vec::new_with_alloc(alloc) where alloc: Alloc soon?

@alexreg no

@sfackler Hmm, why not? What do we need before we can do that? I don't really get the point of this PR otherwise, unless it's for simply changing the global allocator.

@alexreg

I don't really get the point of this PR otherwise, unless it's for simply changing the global allocator.

I think it's simply for changing the global allocator.

@alexreg If you mean on stable, there are a number of unresolved design question that we’re not ready to stabilize. On Nightly, this is supported by RawVec and probably fine to add as #[unstable] for Vec for anyone who feels like working on that.

And yes, as mentioned in the PR its point is to allow changing the global allocator, or allocating (e.g. in a custom collection type) without absuing Vec::with_capacity.

FWIW, the allocator_api crate mentioned in https://github.com/rust-lang/rust/issues/32838#issuecomment-376793369 has RawVec<T, A> and Box<T, A> on the master branch (not released yet). I'm thinking of it as an incubator for what collections generic over the allocation type could look like (plus the fact that I do need a Box<T, A> type for stable rust). I haven't started porting vec.rs to add Vec<T, A> just yet, but PRs are welcome. vec.rs is large.

I'll note that the codegen "issues" mentioned in https://github.com/rust-lang/rust/issues/32838#issuecomment-377097485 should be gone with the changes in #49669.

Now, with some more thought given to using the Alloc trait to help implement an allocator in layers, there are two things that I think would be useful (at least to me):

  • as mentioned earlier, optionally being able to specify a different AllocErr type. This can be useful to make it !, or, now that AllocErr is empty, to optionally have it convey more information than "failed".
  • optionally being able to specify a different Layout type. Imagine you have two layers of allocators: one for page allocations, and one for larger regions. The latter can rely on the former, but if they both take the same Layout type, then both layers need to do their own validation: at the lowest level, that size and alignment is a multiple of the page size, and the higher level, that size and alignment match the requirements for the larger regions. But those checks are redundant. With specialized Layout types, the validation could be delegated to the Layout creation instead of in the allocator itself, and conversions between the Layout types would allow to skip the redundant checks.

@cramertj @SimonSapin @glandium Okay, thanks for clarifying. I may just submit a PR for some of the other collections-prime types. Is it best to do this against your allocator-api repo/crate, @glandium, or rust master?

@alexreg considering the amount of breaking changes to the Alloc trait in #49669, it's probably better to wait for it to merge first.

@glandium Fair enough. That doesn't seem too far away from landing. I just noticed the https://github.com/pnkfelix/collections-prime repo too... what's that in relation to yours?

I would add one more open question:

  • Is Alloc::oom allowed to panic? Currently the docs say that this method must abort the process. This has implications for code that uses allocators since they must then be designed to handle unwinding properly without leaking memory.

I think that we should allow panicking since a failure in a local allocator does not necessarily mean that the global allocator will fail as well. In the worst case, the global allocator's oom will be called which will abort the process (doing otherwise would break existing code).

@alexreg It's not. It just seems to be a plain copy of what's in std/alloc/collections. Well, a two-year old copy of it. My crate is much more limited in scope (the published version only has the Alloc trait as of a few weeks ago, the master branch only has RawVec and Box on top of that), and one of my goals is to keep it building with stable rust.

@glandium Okay, in that case it probably makes sense for me to wait until that PR lands, then create a PR against rust master and tag you, so you know when it gets merged into master (and can then merge it into your crate), fair?

@alexreg makes sense. You /could/ start working on it now, but that would likely induce some churn on your end if/when bikeshedding changes things in that PR.

@glandium I've got other things to keep me busy with Rust for now, but I'll be onto it when that PR gets approved. It will be great go get allocator-generic heap allocation / collections on both nightly and stable soon. :-)

Is Alloc::oom allowed to panic? Currently the docs say that this method must abort the process. This has implications for code that uses allocators since they must then be designed to handle unwinding properly without leaking memory.

@Amanieu This RFC was merged: https://github.com/rust-lang/rfcs/pull/2116 The docs and implementation might just not have been updated yet.

There is one change to the API that I'm considering to submit a PR for:

Split the Alloc trait in two parts: "implementation" and "helpers". The former would be functions like alloc, dealloc, realloc, etc. and the latter, alloc_one, dealloc_one, alloc_array, etc. While there are some hypothetical benefits from being able to have custom implementation for the latter, it's far from the most common need, and when you need to implement generic wrappers (which I've found to be incredibly common, to the point I've actually started to write a custom derive for that), you still need to implement all of them because the wrappee might be customizing them.

OTOH, if an Alloc trait implementer does try to do fancy things in e.g. alloc_one, they're not guaranteed that dealloc_one will be called for that allocation. There are multiple reasons for this:

  • The helpers are not used consistently. Just one example, raw_vec uses a mix of alloc_array, alloc/alloc_zeroed, but only uses dealloc.
  • Even with consistent use of e.g. alloc_array/dealloc_array, one can still safely convert a Vec into a Box, which would then use dealloc.
  • Then there are some parts of the API that just don't exist (no zeroed version of alloc_one/alloc_array)

So even though there are actual use cases for specialization of e.g. alloc_one (and as a matter of fact, I do have such a need for mozjemalloc), one is better off using a specialized allocator instead.

Actually, it's worse than that, in the rust repo, there's exactly one use of alloc_array, and no use of alloc_one, dealloc_one, realloc_array, dealloc_array. Not even box syntax uses alloc_one, it uses exchange_malloc, which takes a size and align. So those functions are more meant as a convenience for clients than for implementers.

With something like impl<A: Alloc> AllocHelpers for A (or AllocExt, whatever name is chosen), we'd still have the convenience of those functions for clients, while not allowing implementers to shoot themselves in the foot if they thought they'd do fancy things by overriding them (and making it easier on people implementing proxy allocators).

There is one change to the API that I'm considering to submit a PR for

Did so in #50436

@glandium

(and as a matter of fact, I do have such a need for mozjemalloc),

Could you elaborate on this use case?

mozjemalloc has a base allocator that purposefully leaks. Except for one kind of objects, where it keeps a free list. I can do that by layering allocators rather than do tricks with alloc_one.

Is it required to deallocate with the exact alignment that you allocated with?

Just to reinforce that the answer to this question is YES, I have this lovely quote from Microsoft themselves:

aligned_alloc() will probably never be implemented, as C11 specified it in a way that’s incompatible with our implementation (namely, that free() must be able to handle highly aligned allocations)

Using the system allocator on Windows will always require knowing the alignment when deallocating in order to correctly deallocate highly aligned allocations, so can we please just mark that question as resolved?

Using the system allocator on Windows will always require knowing the alignment when deallocating in order to correctly deallocate highly aligned allocations, so can we please just mark that question as resolved?

It’s a shame, but it is the way it is. Let’s give up on overaligned vectors then. :confused:

Let’s give up on overaligned vectors then

How come? You just need Vec<T, OverAlignedAlloc<U16>> that both allocates and deallocates with overalignment.

How come? You just need Vec<T, OverAlignedAlloc<U16>> that both allocates and deallocates with overalignment.

I should have been more specific. I meant moving overaligned vectors into an API outside of your control, i.e. one that takes a Vec<T> and not Vec<T, OverAlignedAlloc<U16>>. (For example CString::new().)

You should rather use

#[repr(align(16))]
struct OverAligned16<T>(T);

and then Vec<OverAligned16<T>>.

You should rather use

That depends. Suppose you want to use AVX intrinsics (256 bit wide, 32-byte alignment requirement) on a vector of f32s:

  • Vec<T, OverAlignedAlloc<U32>> solves the problem, one can use AVX intrinsics directly on the vector elements (in particular, aligned memory loads), and the vector still derefs into a &[f32] slice making it ergonomic to use.
  • Vec<OverAligned32<f32>> does not really solve the problem. Each f32 takes 32 bytes of space due to the alignment requirement. The padding introduced prevents the direct use of AVX operations since the f32s are not on continuous memory any more. And I personally find the deref to &[OverAligned32<f32>] a bit tedious to deal with.

For a single element in a Box, Box<T, OverAligned<U32>> vs Box<OverAligned32<T>>, both approaches are more equivalent, and the second approach might indeed be preferable. In any case is nice to have both options.

The tracking post at the top of this issue is horribly out of date (was last edited in 2016). We need an updated list of active concerns to continue the discussion productively.

The discussion would also significantly benefit from an up-to-date design document, containing the current unresolved questions, and the rationale for the design decisions.

There are multiple threads of diffs from "what's currently implemented on nightly" to "what was proposed in the original Alloc RFC" spawning thousands of comments on different channels (rfc repo, rust-lang tracking issue, global alloc RFC, internal posts, many huge PRs, etc.), and what's being stabilized in the GlobalAlloc RFC does not look that much from what was proposed in the original RFC.

This is something that we need anyways to finish updating the docs and the reference, and would be helpful in the current discussions as well.

I think that before we even think about stabilizing the Alloc trait, we should first try implementing allocator support in all of the standard library collections. This should give us some experience with how this trait will be used in practice.

I think that before we even think about stabilizing the Alloc trait, we should first try implementing allocator support in all of the standard library collections. This should give us some experience with how this trait will be used in practice.

Yes, absolutely. Especially Box, since we don't yet know how to avoid having Box<T, A> take up two words.

Yes, absolutely. Especially Box, since we don't yet know how to avoid having Box take up two words.

I don't think we should worry the size of Box<T, A> for the initial implementation, but this is something that can be added later in a backward-compatible way by adding a DeAlloc trait which only supports deallocation.

Example:

trait DeAlloc {
    fn dealloc(&mut self, ptr: NonNull<Opaque>, layout: Layout);
}

trait Alloc {
    // In addition to the existing trait items
    type DeAlloc: DeAlloc = Self;
    fn into_dealloc(self) -> Self::DeAlloc {
        self
    }
}

impl<T: Alloc> DeAlloc for T {
    fn dealloc(&mut self, ptr: NonNull<Opaque>, layout: Layout) {
        Alloc::dealloc(self, ptr, layout);
    }
}

I think that before we even think about stabilizing the Alloc trait, we should first try implementing allocator support in all of the standard library collections. This should give us some experience with how this trait will be used in practice.

I think @Ericson2314 has been working on this, per https://github.com/rust-lang/rust/issues/42774. Would be nice to get an update from him.

I don't think we should worry the size of Box<T, A> for the initial implementation, but this is something that can be added later in a backward-compatible way by adding a DeAlloc trait which only supports deallocation.

That's one approach, but it's not at all clear to me that it's definitely the best one. It has the distinct disadvantages, for example, that a) it only works when a pointer -> allocator lookup is possible (this isn't true of, e.g., most arena allocators) and, b) it adds significant overhead to dealloc (namely, to do the reverse lookup). It may end up being the case that the best solution to this problem is a more general-purpose effect or context system like this proposal or this proposal. Or perhaps something different altogether. So I don't think we should assume that this will be easy to solve in a manner which is backwards-compatible with the current incarnation of the Alloc trait.

@joshlf Considering the fact that Box<T, A> only has access to itself when it is dropped, this is the best thing that we can do with safe code only. Such a pattern might be useful for arena-like allocators which have a no-op dealloc and just free memory when the allocator is dropped.

For more complicated systems where the allocator is owned by a container (e.g. LinkedList) and managed multiple allocations, I expect that Box will not be used internally. Instead, the LinkedList internals will use raw pointers which are allocated and freed with the Alloc instance that is contained in the LinkedList object. This will avoid doubling the size of every pointer.

Considering the fact that Box<T, A> only has access to itself when it is dropped, this is the best thing that we can do with safe code only. Such a pattern might be useful for arena-like allocators which have a no-op dealloc and just free memory when the allocator is dropped.

Right, but Box doesn't know that dealloc is no-op.

For more complicated systems where the allocator is owned by a container (e.g. LinkedList) and managed multiple allocations, I expect that Box will not be used internally. Instead, the LinkedList internals will use raw pointers which are allocated and freed with the Alloc instance that is contained in the LinkedList object. This will avoid doubling the size of every pointer.

I think it would really be a shame to require people to use unsafe code in order to write any collections at all. If the goal is to make all collections (presumably including those outside of the standard library) optionally parametric on an allocator, and Box isn't allocator-parametric, then a collections author must either not use Box at all or use unsafe code (and keep in mind that remembering to always free things is one of the most common types of memory unsafety in C and C++, so it's difficult-to-get-right unsafe code at that). That seems like an unfortunate bargain.

Right, but Box doesn't know that dealloc is no-op.

Why wouldn't adapt what C++ unique_ptr does ?
That is: to store pointer to allocator if it's "stateful", and do not store it if the allocator is "stateless"
(e.g. global wrapper around malloc or mmap).
This would require to split current Alloc traint into two traits: StatefulAlloc and StatelessAlloc.
I realize that it is a very rude and inelegant (and probably someone has already proposed it in previous discussions).
Despite its inelegance this solution is simple and backward compatible (without performance penalties).

I think it would really be a shame to require people to use unsafe code in order to write any collections at all. If the goal is to make all collections (presumably including those outside of the standard library) optionally parametric on an allocator, and Box isn't allocator-parametric, then a collections author must either not use Box at all or use unsafe code (and keep in mind that remembering to always free things is one of the most common types of memory unsafety in C and C++, so it's difficult-to-get-right unsafe code at that). That seems like an unfortunate bargain.

I'm afraid that an implementation of effect or context system which could allow one to write node-based containers like lists, trees, etc in safe manner might take too much time (if it's possible in principle).
I didn't see any papers or academic languages that tackle this problem (please, correct me if such works actually exist).

So resorting to unsafe in implementation of node-based containers might be a necessary evil, at least in short-term perspective.

@eucpp Note that unique_ptr doesn't store an allocator-- it stores a Deleter:

Deleter must be FunctionObject or lvalue reference to a FunctionObject or lvalue reference to function, callable with an argument of type unique_ptr::pointer`

I see this as roughly equivalent to us providing split Alloc and Dealloc traits.

@cramertj Yes, you are right. Still, two traits are required - stateful and stateless Dealloc.

Wouldn't a ZST Dealloc be sufficient?

On Tue, Jun 12, 2018 at 3:08 PM Evgeniy Moiseenko notifications@github.com
wrote:

@cramertj https://github.com/cramertj Yes, you are right. Still, two
traits are required - stateful and stateless Dealloc.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/32838#issuecomment-396716689,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEAJtWkpF0ofVc18NwbfV45G4QY6SCFBks5t8B_AgaJpZM4IDYUN
.

Wouldn't a ZST Dealloc be sufficient?

@remexre I suppose it would :)

I didn't know that rust compiler supports ZST out of box.
In C++ it would require at least some tricks around empty base optimisation.
I'm pretty new at Rust so sorry for some obvious mistakes.

I don’t think we need separate traits for stateful v.s. stateless.

With Box augmented with an A type parameter, it would contain a value of A directly, not a reference or pointer to A. That type can be zero-size fo a stateless (de)allocator. Or A itself can be something like a reference or handle to a stateful allocator that can be shared between multiple allocated objects. So instead of impl Alloc for MyAllocator, you might want to do something like impl<'r> Alloc for &'r MyAllocator

By the way, a Box that only knows how to deallocate and not how to allocate would not implement Clone.

@SimonSapin I'd expect that Cloneing would require specifying an allocator again, the same way that creating a new Box would (that is, it wouldn't be done using the Clone trait).

@cramertj Wouldn't it be inconsistent compared to Vec and other containers that implement Clone ?
What are the downsides of storing instance of Alloc inside Box rather than Dealloc ?
Then Box might implement Clone as well as clone_with_alloc.

I don't this the split traits really affect Clone in a huge way - the impl would just look like impl<T, A> Clone for Box<T, A> where A: Alloc + Dealloc + Clone { ... }.

@sfackler I wouldn't be opposed to that impl, but I would also expect to have a clone_into or something that uses a provided allocator.

Would it make sense to a alloc_copy method to Alloc? This could be used to provide faster memcpy (Copy/Clone) implementations for large allocations, e.g. by doing copy-on-write clones of pages.

That would be pretty cool, and trivial to provide a default implementation for.

What would be using such an alloc_copy function? impl Clone for Box<T, A>?

Yeah, ditto for Vec.

Having looked into it some more it seems like approaches to create copy-on-write pages within the same process range between hacky and impossible, at least if you want to do it more than one level deep. So alloc_copy wouldn't be a huge benefit.

Instead a more general escape hatch that allows future virtual memory shenanigans might be of some use. I.e. if an allocation is large, backed by mmap anyway and stateless then the allocator could promise to be oblivious about future changes to the allocation. The user could then move that memory to a pipe, unmap it or similar things.
Alternatively there could be a dumb mmap-all-the-things allocator and a try-transfer function.

Instead a more general escape hatch that allows future virtual memory

Memory allocators (malloc, jemalloc, ...) do not generally let you steal any kind of memory from them, and they do not generally let you query or change what the properties of the memory they own. So what does this general escape hatch has to do with memory allocators?

Also, virtual memory support differs greatly between platforms, so much that using virtual memory effectively often requires different algorithms per platform often with completely different guarantees. I have seen some portable abstractions over virtual memory, but I haven't seen one yet that wasn't crippled to the point of being useless in some situations due to their "portability".

You're right. Any such use-case (I was mostly thinking of platform-specific optimizations) is probably best served by using a custom allocator in the first place.

Any thoughts on the Composable Allocator API described by Andrei Alexandrescu in his CppCon presentation? The video is available on YouTube here: https://www.youtube.com/watch?v=LIb3L4vKZ7U (he starts to describe his proposed design around 26:00, but the talk is entertaining enough you may prefer to watch it through).

It sounds like the inevitable conclusion of all this is that collections libraries should be generic WRT allocation, and the app programmer himself should be able to compose allocators and collections freely at the construction site.

Any thoughts on the Composable Allocator API described by Andrei Alexandrescu in his CppCon presentation?

The current Alloc API allows writing composable allocators (e.g. MyAlloc<Other: Alloc>) and you can use traits and specialization to achieve pretty much everything that's achieved in Andreis talk. However, beyond the "idea" that one should be able to do that, pretty much nothing from Andrei's talk can apply to Rust since the way Andrei builds the API sits on unconstrained generics + SFINAE/static if from the very start and Rust's generics system is completely different to that one.

I would like to propose stabilizing the rest of the Layout methods. These are already useful with the current global allocator API.

Are these all the methods that you mean?

  • pub fn align_to(&self, align: usize) -> Layout
  • pub fn padding_needed_for(&self, align: usize) -> usize
  • pub fn repeat(&self, n: usize) -> Result<(Layout, usize), LayoutErr>
  • pub fn extend(&self, next: Layout) -> Result<(Layout, usize), LayoutErr>
  • pub fn repeat_packed(&self, n: usize) -> Result<Layout, LayoutErr>
  • pub fn extend_packed(&self, next: Layout) -> Result<(Layout, usize), LayoutErr>
  • pub fn array<T>(n: usize) -> Result<Layout, LayoutErr>

@gnzlbg Yes.

@Amanieu Sounds ok to me, but this issue is already huge. Consider filing a separate issue (or even a stabilization PR) that we could FCP separately?

from Allocators and lifetimes:

  1. (for allocator impls): moving an allocator value must not invalidate its outstanding memory blocks.

    All clients can assume this in their code.

    So if a client allocates a block from an allocator (call it a1) and then a1 moves to a new place (e.g. vialet a2 = a1;), then it remains sound for the client to deallocate that block via a2.

Does this imply, that an Allocator must be Unpin?

Good catch!

Since the Alloc trait is still unstable, I think we still get to change the rules if we want to and modify this part of the RFC. But it is indeed something to keep in mind.

@gnzlbg Yes, I'm aware of the huge differences in the generics systems, and that not everything he details is implementable in the same way in Rust. I've been working on the library on-and-off since posting, though, and I'm making good progress.

Does this imply, that an Allocator _must_ be Unpin?

It doesn't. Unpin is about the behavior of a type when wrapped in a Pin, there's no particular connection to this API.

But can't Unpin be used to enforce the mentioned constraint?

Another question regarding dealloc_array: Why does the function return Result? In the current implementation, this may fail in two cases:

  • n is zero
  • capacity overflow for n * size_of::<T>()

For the first we have two cases (as in the documentation, the implementer can choose between these):

  • The allocation returns Ok on zeroed n => dealloc_array should also return Ok.
  • The allocation returns Err on zeroed n => there is no pointer that can be passed to dealloc_array.

The second is ensured by the following safety constraint:

the layout of [T; n] must fit that block of memory.

This means, that we must call dealloc_array with the same n as in the allocation. If an array with n elements could be allocated, n is valid for T. Otherwise, the allocation would have failed.

Edit: Regarding the last point: Even if usable_size returns a higher value than n * size_of::<T>(), this is still valid. Otherwise the implementation violates this trait constraint:

The block's size must fall in the range [use_min, use_max], where:

  • [...]
  • use_max is the capacity that was (or would have been) returned when (if) the block was allocated via a call to alloc_excess or realloc_excess.

This only holds, as the trait requires an unsafe impl.

For the first we have two cases (as in the documentation, the implementer can choose between these):

  • The allocation returns Ok on zeroed n

Where did you get this information from?

All Alloc::alloc_ methods in the docs specify that the behavior of zero-sized allocations is undefined under their "Safety" clause.

Docs of core::alloc::Alloc (highlighted relevant parts):

A note regarding zero-sized types and zero-sized layouts: many methods in the Alloc trait state that allocation requests must be non-zero size, or else undefined behavior can result.

  • However, some higher-level allocation methods (alloc_one, alloc_array) are well-defined on zero-sized types and can optionally support them: it is left up to the implementor whether to return Err, or to return Ok with some pointer.
  • If an Alloc implementation chooses to return Ok in this case (i.e. the pointer denotes a zero-sized inaccessible block) then that returned pointer must be considered "currently allocated". On such an allocator, all methods that take currently-allocated pointers as inputs must accept these zero-sized pointers, without causing undefined behavior.

  • In other words, if a zero-sized pointer can flow out of an allocator, then that allocator must likewise accept that pointer flowing back into its deallocation and reallocation methods.

So one of the error conditions of dealloc_array is definitely suspicious:

/// # Safety
///
/// * the layout of `[T; n]` must *fit* that block of memory.
///
/// # Errors
///
/// Returning `Err` indicates that either `[T; n]` or the given
/// memory block does not meet allocator's size or alignment
/// constraints.

If [T; N] does not meet the allocator size or alignment constraints, then AFAICT it does not fit the block of memory of the allocation, and the behavior is undefined (per the safety clause).

The other error condition is "Always returns Err on arithmetic overflow." which is pretty generic. It's hard to tell whether it is an useful error condition. For each Alloc trait implementation one might be able to come up with a different one that could do some arithmetic that could in theory wrap, so 🤷‍♂️


Docs of core::alloc::Alloc (highlighted relevant parts):

Indeed. I find it weird that so many methods (e.g. Alloc::alloc) state that zero-sized allocations are undefined behavior, but then we just provide Alloc::alloc_array(0) with implementation-defined behavior. In some sense Alloc::alloc_array(0) is a litmus test to check whether an allocator supports zero-sized allocations or not.

If [T; N] does not meet the allocator size or alignment constraints, then AFAICT it does not fit the block of memory of the allocation, and the behavior is undefined (per the safety clause).

Yup, I think this error condition can be dropped as it's redundant. Either we need the safety clause or an error condition, but not both.

The other error condition is "Always returns Err on arithmetic overflow." which is pretty generic. It's hard to tell whether it is an useful error condition.

IMO, it is guarded by the same safety clause as above; if the capacity of [T; N] would overflow, it does not fit that memory block to deallocate. Maybe @pnkfelix could elaborate on this?

In some sense Alloc::alloc_array(1) is a litmus test to check whether an allocator supports zero-sized allocations or not.

Did you mean Alloc::alloc_array(0)?

IMO, it is guarded by the same safety clause as above; if the capacity of [T; N] would overflow, it does not _fit_ that memory block to deallocate.

Note that this trait can be implemented by users for their own custom allocators, and that these users can override the default implementations of these methods. So when considering whether this should return Err for arithmetic overflow or not, one should not only focus on what the current default implementation of the default method does, but also consider what it might make sense for users implementing these for other allocators.

Did you mean Alloc::alloc_array(0)?

Yes, sorry.

Note that this trait can be implemented by users for their own custom allocators, and that these users can override the default implementations of these methods. So when considering whether this should return Err for arithmetic overflow or not, one should not only focus on what the current default implementation of the default method does, but also consider what it might make sense for users implementing these for other allocators.

I see, but implementing Alloc requires an unsafe impl and implementors has to follow the safety rules mentioned in https://github.com/rust-lang/rust/issues/32838#issuecomment-467093527.

Every API left pointing here for a tracking issue is the Alloc trait or related to the Alloc trait. @rust-lang/libs, do you feel it’s useful to keep this open in addition to https://github.com/rust-lang/rust/issues/42774?

Simple background question: What is the motivation behind the flexibility with ZSTs? It seems to me that, given that we know at compile-time that a type is a ZST, we can completely optimize out both the allocation (to return a constant value) and the deallocation. Given that, it seems to me that we should say one of the following:

  • It's always up to the implementor to support ZSTs, and they can't return Err for ZSTs
  • It's always UB to allocate ZSTs, and it's the caller's responsibility to short-circuit in this case
  • There's some sort of alloc_inner method that callers implement, and an alloc method with a default implementation which does the short circuiting; alloc must support ZSTs, but alloc_inner may NOT be called for a ZST (this is just so that we can add the short-circuiting logic in a single place - in the trait definition - in order to save implementors some boilerplate)

Is there a reason that the flexibility we have with the current API is needed?

Is there a reason that the flexibility we have with the current API is needed?

It's a trade-off. Arguably, the Alloc trait is used more often than it is implemented, so it might make sense to make using Alloc as easy as possible by providing built-in support for ZSTs.

This would mean that implementers of the Alloc trait will need to take care of this, but more importantly to me that those trying to evolve the Alloc trait will need to keep ZSTs in mind on every API change. It also complicates the docs of the API by explaining how ZSTs are (or could be if it is "implementation defined") handled.

C++ allocators pursue this approach, where the allocator tries to solve many different problem. This did not only make them harder to implement and harder to evolve, but also harder for users to actually use because of how all these problems interact in the API.

I think that handling ZSTs, and allocating/deallocating raw memory are two orthogonal and different problems, and therefore we should keep the Alloc trait API simple by just not handling them.

Users of Alloc like libstd will need to handle ZSTs, e.g., on each collection. That's definitely a problem worth solving, but I don't think the Alloc trait is the place for that. I'd expect an utility to solve this problem to pop out within libstd out of necessity, and when that happens, we can maybe try to RFC such an utility and expose it in std::heap.

That all sounds reasonable.

I think that handling ZSTs, and allocating/deallocating raw memory are two orthogonal and different problems, and therefore we should keep the Alloc trait API simple by just not handling them.

Doesn't that imply that we should have the API explicitly not handle ZSTs rather than be implementation-defined? IMO, an "unsupported" error is not very helpful at runtime since the vast majority of callers will not be able to define a fallback path, and will therefore have to assume that ZSTs are unsupported anyway. Seems cleaner to just simplify the API and declare that they're _never_ supported.

Would specialization be used by alloc users to handle ZST? Or just if size_of::<T>() == 0 checks?

Would specialization be used by alloc users to handle ZST? Or just if size_of::<T>() == 0 checks?

The latter should be sufficient; the appropriate code paths would be trivially removed at compile time.

Doesn't that imply that we should have the API explicitly not handle ZSTs rather than be implementation-defined?

For me, an important constraint is that if we ban zero-sized allocations, the Alloc methods should be able to assume that the Layout passed to them is not zero-sized.

There are multiple ways to achieve this. One would be to add another Safety clause to all Alloc methods stating that if the Layout is zero-sized the behavior is undefined.

Alternatively, we could ban zero-sized Layouts, then Alloc doesn't need to say anything about zero-sized allocations since these cannot safely happen, but doing that would have some downsides.

For example, , some types like HashMap build up the Layout from multiple Layouts, and while the final Layout might not be zero-sized, the intermediate ones might be (e.g. in HashSet). So these types would need to use "something else" (e.g. a LayoutBuilder type) to build up their final Layouts, and pay for a "non-zero-sized" check (or use an _unchecked) method when converting to Layout.

Would specialization be used by alloc users to handle ZST? Or just if size_of::() == 0 checks?

We can't specialize on ZSTs yet. Right now all code uses size_of::<T>() == 0.

There are multiple ways to achieve this. One would be to add another Safety clause to all Alloc methods stating that if the Layout is zero-sized the behavior is undefined.

It'd be interesting to consider whether there are ways to make this a compile-time guarantee, but even a debug_assert that the layout is non-zero-sized should be sufficient to catch 99% of bugs.

I haven't paid any attention to discussions about allocators, so sorry about that. But I've long wished that the allocator had access to the type of the value its allocating. There may be allocator designs that could use it.

Then we'd probably have the same issues as C++ and it's allocator api.

But I've long wished that the allocator had access to the type of the value its allocating. T

What do you need this for?

@gnzblg @brson Today I had a potential use case for a knowing _something_ about the type of value being allocated.

I'm working on a global allocator which can be switched between three underlying allocators - a thread local one, a global one with locks, and one for coroutines to use - the idea being I can constrain a coroutine representing a network connection to a maximum amount of dynamic memory usage (in the absence of being able to control allocators in collections, esp in third party code)*.

It'd be possibly be handy to know if I'm allocating a value that might move across threads (eg Arc) vs one that won't. Or it might not. But it is a possible scenario. At the moment the global allocator has a switch which the user uses to tell it from which allocator to make allocations (not needed for realloc or free; we can just look at the memory address for that).

*[It also allows me to use NUMA local memory wherever possible without any locking, and, with a 1 core 1 thread model, to cap total memory usage].

@raphaelcohn

I'm working on a global allocator

I don't think any of this would (or could) apply to the GlobalAlloc trait, and the Alloc trait already has generic methods that can make use of type information (e.g. alloc_array<T>(1) allocates a single T, where T is the actual type, so the allocator can take the type into account while performing the allocation). I think it would be more useful for the purposes of this discussion to actually see code implementing allocators that make use of type information. I haven't heard any good argument about why these methods need to be part of some generic allocator trait, as opposed to just being part of the allocator API, or some other allocator trait.

I think it would also be very interesting to know which of the types parametrized by Alloc do you intend to combine with allocators that use type information, and what do you expect to be the outcome.

AFAICT, the only interesting type for that would be Box because it allocates a T directly. Pretty much all other types in std never allocate a T, but some private internal type that your allocator can't know anything about. For example, Rc and Arc could allocate (InternalRefCounts, T), List / BTreeSet / etc. allocate internal node types, Vec/Deque/... allocate arrays of Ts, but not Ts themselves, etc.

For Box and Vec we could add in backwards compatible ways a BoxAlloc and an ArrayAlloc trait with blanket impls for Alloc that allocators could specialize to hijack how those behave, if there is ever a need to attack these problems in a generic way. But is there a reason why providing your own MyAllocBox and MyAllocVec types that conspire with your allocator to exploit type information isn't a viable solution ?

As we now have a dedicated repository for the allocators WG, and the list in the OP is out of date, this issue may be closed to keep discussions and tracking of this feature in one place?

A good point @TimDiekmann! I'm going to go ahead and close this in favor of discussion threads in that repository.

This is still the tracking issue that some #[unstable] attribute point to. I think it should not be closed until these features have been either stabilized or deprecated. (Or we could change the attributes to point to a different issue.)

Yeah unstable features referenced in git master should definitely have an open tracking issue.

Agreed. Also added a notice and link to the OP.

Was this page helpful?
0 / 5 - 0 ratings