Rust: Tracking issue for Vec::drain_filter and LinkedList::drain_filter

Created on 15 Jul 2017  ·  119Comments  ·  Source: rust-lang/rust

    /// Creates an iterator which uses a closure to determine if an element should be removed.
    ///
    /// If the closure returns true, then the element is removed and yielded.
    /// If the closure returns false, it will try again, and call the closure
    /// on the next element, seeing if it passes the test.
    ///
    /// Using this method is equivalent to the following code:
    ///
    /// ```
    /// # let mut some_predicate = |x: &mut i32| { *x == 2 };
    /// # let mut vec = vec![1, 2, 3, 4, 5];
    /// let mut i = 0;
    /// while i != vec.len() {
    ///     if some_predicate(&mut vec[i]) {
    ///         let val = vec.remove(i);
    ///         // your code here
    ///     }
    ///     i += 1;
    /// }
    /// ```
    ///
    /// But `drain_filter` is easier to use. `drain_filter` is also more efficient,
    /// because it can backshift the elements of the array in bulk.
    ///
    /// Note that `drain_filter` also lets you mutate ever element in the filter closure,
    /// regardless of whether you choose to keep or remove it.
    ///
    ///
    /// # Examples
    ///
    /// Splitting an array into evens and odds, reusing the original allocation:
    ///
    /// ```
    /// let mut numbers = vec![1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15];
    ///
    /// let evens = numbers.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
    /// let odds = numbers;
    ///
    /// assert_eq!(evens, vec![2, 4, 6, 8, 14]);
    /// assert_eq!(odds, vec![1, 3, 5, 9, 11, 13, 15]);
    /// ```
    fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<T, F>
        where F: FnMut(&mut T) -> bool,
    { ... }

I'm sure there's an issue for this somewhere, but I can't find it. Someone nerd sniped me into implementing it. PR incoming.

A-collections B-unstable C-tracking-issue Libs-Tracked T-libs

Most helpful comment

Is there anything blocking this from being stabilized?

All 119 comments

Maybe this doesn't need to include the kitchen sink, but it _could_ have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.

Is there any chance this will stabilize in some form in the not to far future?

If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f) case I would opt for doing this.

( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return } )

I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

No idea, but drain_where sounds much better and is much more intuitive.
Is there still a chance to change it?

.remove_if has been a prior suggestion too

I think drain_where does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.

remove_if sounds a lot like a conditional version of remove which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10); removes the letter at index 3 if it's < 10.

drain_filter on the other hand is slightly ambiguous, does it drain then filter in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter would return,
and if so shouldn't it be called filtered_drain as the filter get logically used before...

There is no precedent for using _where or _if anywhere in the standard library.

@Gankro is there a precedent for using _filter anywhere? I also don't know that that is that a reason for not using the less ambiguous terminology? Other places in the standard library already use a variety of suffixes such as _until and _while.

The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.

IMO it's not filter that's the issue. Having just searched for this (and being a newbie), drain seems to be fairly non-standard compared to other languages.

Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete (as in delete_if), remove, filter or reject.

I actually _searched_ for filter, saw drain_filter and _kept searching_ without reading because drain didn't seem to represent the simple thing that I wanted to do.

It seems like a simple function named filter or reject would be much more intuitive.

On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:

        vec![
            "",
            "something",
            a_variable,
            function_call(),
            "etc",
        ]
            .reject(|i| { i.is_empty() })
            .join("/")

With the current implementation, what it would be joining on would be the rejected values.

I'd like to see both an accept and a reject. Neither of which mutate the original value.

You can already do the chaining thing with filter alone. The entire point of drain_filter is to mutate the vector.

@rpjohnst so I searched here, am I missing filter somewhere?

Yes, it's a member of Iterator, not Vec.

Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).

Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).

I came across drain_filter in docs as a google result for rust consume vec. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.

drain_where is nice, but as long as the user is aware of what drain and filter do, I think it's clear that the method drains the data based on a predicate filter.

I still feel as though drain_filter implies that it drains (i.e., empties) and then filters. drain_where on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).

Shouldn't linked_list::DrainFilter implement Drop as well, to remove any remaining elements that match the predicate?

Yes

Why exactly does dropping the iterator cause it to run through to the end? I think that's surprising behaviour for an iterator and it could also be, if desired, done explicitly. The inverse of taking only as many elements out as you need is impossible on the other hand because mem::forgeting the iterator runs into leak amplification.

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().
On an intuitive logical level, it would make more sense to return true for entries I want to keep, does anyone else feel this way? (Especially considering that retain() already works this way)
Why not do that, and rename drain_filter() to retain_iter() or retain_drain() (or drain_retain())?
Then it would also mirror retain() more closely!

This is why I proposed to rename it to
drain_where as it then is clear that:

  1. It is a form of drain so we use drain in the name

  2. By using where in combination with drain it is clear that the
    elements _where_ the predicate is true get drained, i.e removed and returned

  3. It is more in sync with other names in std, normally if you have a
    function consisting of two predicates you can emulate it (roughly) by using
    functions representing each of the predicates in a "and then" fashion, e.g.
    filter_map can be emulated (roughly) as filter an then map. The current
    name indicates it is drain and then filter, but it isn't even close to it
    as it does not do a complete drain at all.

On Sun, Feb 25, 2018, 17:04 Boscop notifications@github.com wrote:

I've been using this function a lot and I always have to remember to
return true for the entries I want to drain, which feels
counter-intuitive compared to retain_mut().
On a primal level, it would make more sense to return true for entries I
want to keep, does anyone else feel this way?
Why not do that, and rename drain_filter() to retain_filter()?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368320990,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kfwaNvz8YBwCE4BxDkeHgGxLvcWxks5tYYRxgaJpZM4OY1me
.

But with drain_where() the closure would still have to return true for elements that should be removed, which is the opposite of retain() which makes it inconsistent..
Maybe retain_where?
But I think you're right that it makes sense to have "drain" in the name, so I think drain_retain() makes the most sense: It's like drain() but retaining the elements where the closure returns true.

While it does not change, that you have to return true it makes clear that
you have to do so.

I personally would use either:

a. drain_where
b. retain_where
c. retain

I would not use drain_retain.
Drain and retain speak about the same kind of process but from opposite
perspectives, drain speaks about what you remove (and r return) retain
speaks about what you keep (in the way it is already used in std).

Currently retaim is already implemented in std with the major difference
that it is discarding elements while drain is returning them, which makes
retain (sadly) unsuited to be use in the name (except if you want to call
it retain_and_return or similar).

Another point which speaks for drain is ease of migration, i.e. migrating
to drain_where is as easy as running a word based search and replace on
the code, while changing it to retain would need a additional negation of
all used predicates/filter functions.

On Sun, Feb 25, 2018, 18:01 Boscop notifications@github.com wrote:

But with drain_where() the closure would still have to return true for
elements that should be removed, which is the opposite of retain() which
makes it inconsistent..
Maybe retain_where?
But I think you're right that it makes sense to have "drain" in the name,
so I think drain_retain() makes the most sense: It's like drain() but
retaining the elements where the closure returns true.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368325374,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kfG4oZHxGfpOSK8DjXW3_2O1Eo3Rks5tYZHxgaJpZM4OY1me
.

But how often would you migrate from drain() to drain_filter()?
In all cases until now, I migrated from retain() to drain_filter() because there is no retain_mut() in std and I need to mutate the element! So then I had to invert the closure return value..
I think drain_retain() makes sense because the drain() method drains unconditionally all elements in the range, whereas drain_retain() retains the elements where the closure returns true, it combines the effects of the drain() and retain() methods.

Sorry I mean to migrate from drain_filter to drain_where.

That you have a solution using retain and then need to use
drain_filter is a aspects I have not yet considered.

On Sun, Feb 25, 2018, 19:12 Boscop notifications@github.com wrote:

But why would you migrate from drain() to drain_filter()?
In all cases until now, I migrated from retain() to drain_filter()
because there is no retain_mut() in std and I need to mutate the element!
I think drain_retain() makes sense because the drain() method drains
unconditionally all elements in the range, whereas drain_retain() retains
the elements where the closure returns true.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368330896,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kSayIk_fbp5M0RsZW5pYs3hDICQIks5tYaJ0gaJpZM4OY1me
.

Ah yes, but I think the "price" of inverting the closures in current code that uses drain_filter() is worth it, to get a consistent and intuitive API in std and then in stable.
It's only a small fixed cost (and eased by the fact that it would go along with a renaming of the function, so the compiler error could tell the user that the closure has to be inverted, so it wouldn't silently introduce a bug), compared to the cost of standardizing drain_filter() and then people always having to invert the closure when migrating from retain() to drain_filter().. (on top of the mental cost of remembering to do that, and the costs of making it harder to find it in the docs, coming from retain() and searching for "something like retain() but passing &mut to its closure, which is why I think it makes sense that the new name of this function has "retain" in the name, so that people find it when searching in the docs).

Some anecdotal data: In my code, I always only needed the retain_mut() aspect of drain_filter() (often they were using retain() before), I never had use cases where I needed to process the drained values. I think this will also the most common use case for others in the future (since retain() doesn't pass &mut to its closure so that drain_filter() has to cover that use case, too, and it's a more common use case than needing to process the drained values).

The reason why I'm agains drain_retain is because of the way names are currently used in std wrt. collections:

  1. you have function names using predicates which have producing/consuming concepts associated with them (wrt. rust, iterations). For example drain, collect, fold, all, take, ...
  2. this predicates have sometimes modifiers e.g. *_where, *_while
  3. you have function names using predicates which have modifying properties (map, filter, skip, ...)

    • here it's vague if it is element or iteration modifying (map vs. filter/skip)

  4. function names chaining multiple predicates using modifying properties e.g. filter_map

    • having a concept of roughly apply modifier_1 and then apply modifier_2, just that it's faster or more flexible doing this in one step

You sometimes might have:

  1. function names combining producing/consuming predicates with modifying ones (e.g. drain_filter)

    • _but_ most times it is better/less confusing to combine them with modifiers (e.g. drain_where)

You normally do not have:

  1. two of the produceing/consuming predicates combined into one name, i.e. we do not have thinks like take_collect as it is easily confusing

drain_retain does kinda make sense but falls in the last category, while you probably can guess what it does it basically says remove and return all elements "somehow specified" and then keep all elements "somehow specified" discarding other elements.


One the other hand I don't know why there should not be a retain_mut maybe opening a quick RFC introducing retain_mut as a efficient way to combine modify + retain I have a hunch it might be faster
stabilized then this function. Until then you could consider writing a extension trait providing
you own retain_mut using iter_mut + a bool-array (or bitarray, or...) to keep track of which elements
have to be reomved. Or providing your own drain_retain which internally uses drain_filer/drain_where
but wraps the predicate into a not |ele| !predicate(ele).

@dathinab

  1. We are talking about a method on collections here, not on Iterator. map, filter, filter_map, skip, take_while etc are all methods on Iterator. Btw, which methods do you mean that use *_where?
    So we have to compare the naming scheme to methods that already exist on collections, e.g. retain(), drain(). There is no confusion with Iterator methods which transform one iterator into another iterator.
  2. AFAIK the consensus was that retain_mut() would not be added to std because drain_filter() will already be added and people were advised to use that. Which brings us back to the use case of migrating from retain() to drain_filter() being very common, so it should have a similar name and API (closure returning true means keep the entry)..

I haven't been aware that retain_mut was already discussed.

We are talking about general naming schemes in std mainly wrt. to
collections, which does include Iterators as they are one of the main
access methods for collections in rust, especially when it's about
modifying more than one entry.

  • _where is just a example for suffixes to express a slightly modified
    function. Suffixes of this kind which currently are used in std are mainly
    _until, _while, _then, _else , _mut and _back.

The reason why drain_retain is confusing is that it's not clear if it is
drain or retain based, if it is drain based, returning true would remove
the value, if is retain based it would keep it. Using _where makes it at
last clear what is expected of the passed in function.

On Mon, Feb 26, 2018, 00:25 Boscop notifications@github.com wrote:

@dathinab https://github.com/dathinab

  1. We are talking about a method on collections here, not on Iterator.
    map, filter, filter_map, skip, take_while etc are all methods on Iterator.
    Btw, which methods do you mean that use *_where?
    So we have to compare the naming scheme to methods that already exist
    on collections, e.g. retain(), drain(). There is no confusion with
    Iterator methods which transform the iterator into another iterator.
  2. AFAIK the consensus was that retain_mut() would not be added to std
    because drain_filter() will already be added and people were advised
    to use that. Which brings us back to the use case of migrating from
    retain() to drain_filter() being very common, so it should have a
    similar name and API (closure returning true means keep the entry)..


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368355110,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kfkRAZ5OtLFZ-SciAmjHDEXdgp-0ks5tYevegaJpZM4OY1me
.

I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().

FWIW, I think retain is the counter-intuitive name here. I usually find myself wanting to delete certain elements from a vector, and with retain I always have to invert that logic.

But retain() is already in stable, so we have to live with it.. And so my intuition got used to that..

@Boscop: and so is drain which is the inverse of retain but also returns the removed elements and the usage of suffixes like _until,_while for making functions available which are just a slightly modified version of a existing functionality.

I mean if I would describe drain it would be something like:

_remove and return all elements specified "in some way", keep all other elements_
where _"in some way"_ is _"by slicing"_ for all sliceable collection types and _"all"_ for the rest.

The description for the function discussed here is _the same_ except that
_"in some way"_ is _"where a given predicate returns true"_.

One the other hand the description I would give retain is:
_only retain (i.e. keep) elements where a given predicate returns true, discard the rest_

(yes, retain could have been used in a way where it does not discard the rest, sadly it wasn't)


I do think that it would have been really nice if retain would have
passed &mut T to the predicate and maybe returned the removed values.
Because I think retain is a more intuitive name base.

But independent of this I also think that both drain_filter/drain_retain are suboptimal
as they do not make it clear if the predicate has to return true/false to keep/drain a entry.
(drain indicates true does remove it as it speaks about removing elements while filter
and retrain speaks about which elements to keep, at last in rust)


In the end it's not _that_ important which of the names is used, it would be just nice if it gets stabilized.

Doing a poll and/or letting someone from the language team decide might be the best way to move thinks forward?

I think something like drain_where, drain_if, or drain_when, is much clearer than drain_filter.

@tmccombs Out of those 3, I think drain_where makes the most sense. (Because if implies either do the whole thing (in this case draining) or not and when is temporal.)
Compared to drain_filter the closure return value is the same with drain_where (true to remove an element) but that fact is made clearer/explicit by the name, so it eliminates the risk of accidentally interpreting the meaning of the closure return value wrongly.

I think it’s more than time to stabilize. Summary of this thread:

  • Should a R: RangeArgument parameter be added?
  • Should the boolean value be inverted? (I think the current logic makes sense: returning true from the callback causes that item to be included in the iterator.)
  • Naming. (I like drain_where.)

@Gankro, what do you think?

The libs team discussed this and the consensus was to not stabilize more drain-like methods at the moment. (The existing drain_filter method can stay in Nightly as unstable.) https://github.com/rust-lang/rfcs/pull/2369 proposes adding another drain-like iterator that doesn’t do anything when dropped (as opposed to consuming the iterator to its end).

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

  • A sub-range (through RangeArgument a.k.a. RangeBounds) v.s. the entire collection (though the latter could be achieved by passing .., a value of type RangeFull).
  • Draining everything (possibly within that range) v.s. only elements that match a boolean predicate
  • Self-exhausting on drop v.s. not (leaving the rest of the elements in the collection).

Possibilities might include "overloading" a method by making it generic, or a builder pattern.

One constraint is that the drain method is stable. It can possibly be generalized, but only in backward-compatible ways.

We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:

How and where does the team foresee this type of experimentation happening?

How: come up with and propose a concrete API design, possibly with a proof-of-concept implementation (which can be done out of tree through at least Vec::as_mut_ptr and Vec::set_len). Where doesn’t matter too much. Could be a new RFC or a thread in https://internals.rust-lang.org/c/libs, and link it from here.

I've been playing around with this for a bit. I'll open a thread on internals in the next days.

I think a general API that works like this makes sense:

    v.drain(a..b).where(pred)

So it's a builder-style API: If .where(pred) is not appended, it will drain the whole range unconditionally.
This covers the capabilities of the current .drain(a..b) method as well as .drain_filter(pred).

If the name drain can't be used because it's already in use, it should be a similar name like drain_iter.

The where method shouldn't be named *_filter to avoid confusion with filtering the resulting iterator, especially when where and filter are used in combination like this:

    v.drain(..).where(pred1).filter(pred2)

Here, it will use pred1 to decide what will be drained (and passed on in the iterator) and pred2 is used to filter the resulting iterator.
Any elements that pred1 returns true for but pred2 returns false for will still get drained from v but won't get yielded by this combined iterator.

What do you think about this kind of builder-style API approach?

For a second I forgot that where can't be used as function name because it's already a keyword :/

And drain is already stabilized so the name can't be used either..

Then I think the second best overall option is to keep the current drain and rename drain_filter to drain_where, to avoid the confusion with .drain(..).filter().

(As jonhoo said above: )

what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.

I've opened a thread on internals.
The TLDR is that I think that non-selfexhaustion is a bigger can of worms than expected in the general case and that we should stabilize drain_filter sooner rather than later with a RangeBounds parameter. Unless someone has a good idea for solving the issues outlined there.

Edit: I've uploaded my experimental code: drain experiments
There are also drain and clearing benches and some tests but don't expect clean code.

Totally missed out on this thread. I've had an old impl that I've fixed up a bit and copy pasted to reflect a few of the options described in this thread. The one nice thing about the impl that I think will be non-controversial is that it implements DoubleEndedIterator. View it here.

@Emerentius but then we should at least rename drain_filter to drain_where, to indicate that the closure has to return true to remove the element!

@Boscop Both imply the same 'polarity' of true => yield. I personally don't care whether it's called drain_filter or drain_where.

@Popog Can you summarize the differences and pros & cons? Ideally over at the internals thread. I think DoubleEndedIterator functionality could be added backwards compatibly with zero or low overhead (but I haven't tested that).

How about drain_or_retain? It's a grammatically meaningful action, and it signals that it does one or the other.

@askeksa But that doesn't make it clear whether returning true from the closure means "drain" or "retain".
I think with a name like drain_where, it's very clear that returning true drains it, and it should be clear to everyone that the elements that aren't drained are retained.

It would be nice if there was some way to limit/stop/cancel/abort the drain. For example, if I wanted to drain the first N even numbers, it would be nice to be able to just do vec.drain_filter(|x| *x % 2 == 0).take(N).collect() (or some variant of that).

As it's currently implemented, DrainFilter's drop method will always run the drain to completion; it can't be aborted (at least I haven't figured out any trick that would do that).

If you want that behaviour you should just close over some state that tracks how many you've seen and start returning false. Running to completion on drop is necessary to make adaptors behave reasonably.

I just noticed that the way drain_filter is currently implemented is not unwind safe but
actually a safety hazard wrt. unwind + resume safety. Additionally it easily causes an abord, both
of which are behaviours a method in std really shouldn't have. And while writing this I noticed
that it's current implementation is unsafe

I know that Vec is by default not unwind safe, but the behaviour of drain_filer when the
predicate panics is well surprising because:

  1. it will continue calling the closure which paniced when drop
    if the closure panics again this will cause an aboard and while some people
    like all panics to be aboard other work with error-kernel patterns and for them
    ending up with a aboard is quite bad
  2. if will not correctly continue the draining potentially one value
    and containing one value already dropped potentially leading to use after free

An example of this behaviour is here:
play.rust-lang.org

While the 2. point should be solvable I think the first point on itself should
lead to an reconsideration of the behaviour of DrainFilter to run to completation
on drop, reasons for changing this include:

  • iterators are lazy in rust, executing an iterator when dropping is kinda unexpected behaviour
    deriving from what is normally expected
  • the predicate passed to drain_filter might panic under some circumstances (e.g. a lock
    got poisoned) in which case it's likely-ish to panic again when called during drop leading
    to an double panic and therefore aboard, which is quite bad for anyone using error kernel
    patterns or at last wanting to shut down in a controlled way, it's fine if you use panic=aboard anyway
  • if you have side effects in the predicate and don't run DrainFilter to completion you might get
    surprising bugs when it is then run to completion when dropped (but you might have done
    other thinks between draining it to a point and it being dropped)
  • you can not opt-out of this behaviour without modifying the predicate passed to it, which you
    might not be able to do without wrapping it, on the other hand you can always opt-in to run
    it to completion by just running the iterator to completion (yes this last argument is a bit
    handwavey)

Arguments for running to completion include:

  • drain_filter is similar to ratain which is a function, so people might be surprised when they
    "just" drop DrainFilter instead of running it to completion

    • this argument was countered many times in other RFC's and is why #[unused_must_use]

      exist's, which in some situations already recommend to use .for_each(drop) which ironically

      happens to be what DrainFilter does on drop

  • drain_filter is often used for it's side effect only, so it's to verbose

    • using it that way makes it rougly equal to retain



      • but retain use &T, drain_filter used &mut T



  • others??
  • [EDIT, ADDED LATER, THX @tmccombs ]: not completing on drop can be very confusing when combined with adapters like find, all, any which I quite a good reason to keep the current behaviour.

It might be just me or I missed some point but changing the Drop behaviour and
adding #[unused_must_use] seems to be preferable?

If .for_each(drop) is to long we might instead consider to add an RFC for iterators meant for
there side effect adding a method like complete() to the iterator (or well drain() but this
is a complete different discussion)

others??

I can't find the original reasoning, but I remember there was also some problem with adapters working with a DrainFilter that doesn't run to completion.

See also https://github.com/rust-lang/rust/issues/43244#issuecomment-394405057

Good point, e.g. find would cause drain to drain just until it hit's the first
match, similar all, any do short circuit, which can be quite confusing
wrt. drain.

Hm, maybe I should change my opinion. Through this might be a general problem
with iterators having side-effects and maybe we should consider a general solution
(independent of this tracking issue) like a .allways_complete() adapter.

I have personally not found any safety reason why drain needs to run to completion but as I've written here a couple posts above, the side-effects on next() interact in a suboptimal way with adapters such as take_while, peekable and skip_while.

This also brings up the same issues as my RFC on non-selfexhausting drain and its companion selfexhausting iter adapter RFC.

It's true that drain_filter can easily cause aborts but can you show an example of where it violates safety?

Yup, I already did: play.rust-lang.org

Which is this:

#![feature(drain_filter)]

use std::panic::catch_unwind;

struct PrintOnDrop {
    id: u8
}

impl Drop for PrintOnDrop {
    fn drop(&mut self) {
        println!("dropped: {}", self.id)
    }
}

fn main() {
    println!("-- start --");
    let _ = catch_unwind(move || {
        let mut a: Vec<_> = [0, 1, 4, 5, 6].iter()
            .map(|&id| PrintOnDrop { id })
            .collect::<Vec<_>>();

        let drain = a.drain_filter(|dc| {
            if dc.id == 4 { panic!("let's say a unwrap went wrong"); }
            dc.id < 4
        });

        drain.for_each(::std::mem::drop);
    });
    println!("-- end --");
    //output:
    // -- start --
    // dropped: 0    <-\
    // dropped: 1       \_ this is a double drop
    // dropped: 0  _  <-/
    // dropped: 5   \------ here 4 got leaked (kind fine)  
    // dropped: 6
    // -- end --

}

But that's an implementation internal think, which went wrong.
Basically the open question is how to handle the panic of an predicate function:

  1. skip the element it panicked on, leak it and increase the del counter

    • requires some form of panic detection

  2. do not advance idx before calling the predicate

    • but this means on drop will call it again with the same predicate

Another question is if it's a good idea to run functions which can be seen as api user input on drop
in general, but then this is the only way not to make find, any, etc. behave confusing.

Maybe a consideration could be something like:

  1. set a flag when entering next, unset it before returning from next
  2. on drop if the flag is still set we know we paniced and hence leak
    the remaining items OR drop all remaining items

    1. can be quite a big leak with unexpected side effects if you e.g. leak an Arc

    2. can be very surprising if you have Arc and Weak's

Maybe there is an better solution.
Through whichever it is it should be documented in rustdoc once implemented.

@dathinab

Yup, I already did

Leaking is undesirable but fine and may be hard to avoid here, but a double-drop is definitely not. Good catch! Would you like to report a separate issue about this safety problem?

Does drain_filter do reallocations every time it removes an item from collection? Or it does reallocate only once and works like std::remove and std::erase (in pair) in C++? I'd prefer such behavior because of exactly one allocation: we simply put our elements to the end of collection and then removes shrink it to proper size.

Also, why there is no try_drain_filter ? Which returns Option type, and None value if we should stop? I have a very big collection and it is meaningless to continue for me when I have already got what I needed.

The last time I took a at the code it did something like: created a "gap"
when moving out elements and move a element which is not drained to the
begin of the gap when it finds one. With this each element which has to be
moved (either out or to a new place in the array) is only moved once.
Also like e.g. remove it doesn't reallocate. The freed part just becomes
part of the unused capacity.

On Fri, Aug 10, 2018, 07:11 Victor Polevoy notifications@github.com wrote:

Does drain_filter do reallocations every time it removes an item from
collection? Or it does reallocation only once and works like std::remove
and std::erase (in pair) in C++? I'd prefer such behavior because of
exactly one allocation: we simply put our elements to the end of collection
and then removes shrink it to proper size.

Also, why there is no try_drain_filter ? Which returns Option type, and
None value if we should stop? I have a very big collection and it is
meaningless to continue for me when I have already got what I needed.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-411977001,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kdOZm4bj6iR9Hj831Qh72d36BxQSks5uPRYNgaJpZM4OY1me
.

@rustonaut thanks. What is your opinion about try_drain_filter? :)

P.S. Just looked at the code too, it looks as it works the way we wanted.

It advances element by element when iteration, so normally you could expect
it to stop iterating when being dropped, but it was deemed to be too
confusing so it actually drains to the end when dropped.
(Which drastically increases the likely hood of double panics and stuff
like that).

So it's unlikely that you will get a try version which behaves like you
expect.

For fairness drain early stopping when iterating really can be confusing in
some situations e.g. thing.drain_where(|x| x.is_malformed()).any(|x| x.is_dangerus()) would not drain all malformed ones but just until one of
found which is also dangerous. (The current Impl. does drain all malformed
by continuing draining on drop).

On Fri, Aug 10, 2018, 10:52 Victor Polevoy notifications@github.com wrote:

@rustonaut https://github.com/rustonaut thanks. What is your opinion
about try_drain_filter? :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-412020490,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kcEMHCayqvhI6D4LK4ITG2x5di-9ks5uPUnpgaJpZM4OY1me
.

I think this would be more versatile:

fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>

Hi, I was searching for the drain_filter functionality for HashMap but it doesn't exist, and was asked to open an issue when I found this one. Should it be in a separate issue?

Is anything currently blocking this from stabilization? Is it still unwind-unsafe as reported above?

This seems like a pretty small feature, and it has been in limbo for over a year.

I think this would be more versatile:

fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>

I don't think this is better than a composition of drain_filter and map.

Is it still unwind-unsafe as reported above?

There appears to be a tough choice between not draining all matching elements if the iteration stops early, or a potential panic during unwind if filtering and draining to the end is done at the drop of a DrainFilter.
I think this feature is rife with problems either way and as such should not be stabilized.

Is there any particular problem with having it behave differently on unwind?

Possibilities:

  • It could run to completion normally, but leave behind matching elements on unwind (so that all remaining elements are assumed not to match).
  • It could run to completion normally, but truncate the vector after the last written position on unwind (so that all remaining elements are assumed to match).
  • It could run to completion normally, but truncate the vector to length 0 on unwind.

The most comprehensible counter-argument I can think of is that drop code which depends on the invariant usually provided by drain_filter (that, at the end, the elements in the vec will be exactly those which failed the condition) may be arbitrarily distant from the code (most likely normal, safe code) which uses drain_filter.

However, suppose there was such a case. This code will be buggy no matter how the user writes it. E.g. if they write an imperative loop that went in reverse and swap-removed elements, then if their condition can panic and their drop impl relies heavily on the filter condition being false, the code still has a bug. Having a function like drop_filter whose documentation can call attention to this edge case seems like an improvement in comparison.

Also, thanks, I found this playground example posted earlier in the thread which demonstrates that the current implementation still double-drops elements. (so it definitely cannot be stabilized as is!)

It might be worth opening a separate issue for the soundness bug? That can then be marked as I-unsound.

As far as I know you can't Mark or as unsound as double panic _is sound_
just very inconvenient as it aborts. Also as far as I remember the
possibility of double panicking isn't a bug but the behavior implicitly but
knowingly chosen.

The options are basically:

  1. Do not run to completion on drop.
  2. Do run to completion but potentially abort due to double panic
  3. Do drop all "unchecked" elements on drop during panic.
  4. Do not complete on drop during panic.

The problems are:

  1. => Unexpected behaviour in many use cases.
  2. => Unexpected abort if the predicate can panic, especially if you use it
    to "just" remove elements, i.e. you don't use the returned iterator.
  3. => Unexpected difference between drop in and outside of a panic. Just
    consider someone _using_ drain_filter in a drop function.
  4. => See 3.

Or with other words 1. leads to confusion in normal use cases, 2. can lead
to abort if the predicate can panic 3.,4. make it so that you can't really
use it in a drop method, but how do you now a function you use there
doesn't use it internally.

As a result of this option 3.,4. are no-go. The problems with option 2. are
more rare then the ones in 1. so 2. was chosen.

IMHO it would be better to have a drain+drain_filter API which do not run
to completion on drop + a general iterator combinator which does run to
completion on drop + a method which complete a iterator but just drops all
remaining items. The problem is drain is already stable, the iterator
combinator adds overhead as it needs to fuse the inner iterator and drain
might not be the most appropriate name.

On Mon, May 20, 2019, 09:28 Ralf Jung notifications@github.com wrote:

It might be worth opening a separate issue for the soundness bug? That can
then be marked as I-unsound.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244?email_source=notifications&email_token=AB2HJEL7FS6AA2A2KF5U2S3PWJHK7A5CNFSM4DTDLGPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVX5RWA#issuecomment-493869272,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB2HJEMHQFRTCH6RCQQ64DTPWJHK7ANCNFSM4DTDLGPA
.

Double-drops are unsound though.

Created #60977 for the soundness issue

Thanks, I feel stupid for reading double drop as double panic :man_facepalming: .

  1. => Unexpected difference between drop in and outside of a panic. Just
    consider someone _using_ drain_filter in a drop function.

3.,4. make it so that you can't really
use it in a drop method, but how do you now a function you use there
doesn't use it internally.

This to me is still not a big deal.

If somebody uses drain_filter in a Drop impl with a condition that can panic, the problem isn't that they chose to use drain_filter; the problem is that their Drop impl contains code that can panic! Likewise it doesn't matter whether a method uses drain_filter internally; what matters is whether calling the method may panic (i.e. has the caller upheld the contract of the method?).

Sorry, I responded too early. I think I see what you mean now. I'll mull this over a bit more.

Alright, so your argument is that code inside a drop impl that uses drain_filter may mysteriously break if it happens to run during unwind. (this isn't about drain_filter panicking, but about other code panicking which causes drain_filter to run):

impl Drop for Type {
    fn drop(&mut self) {
        self.vec.drain_filter(|x| x == 3);

        // Do stuff that assumes the vector has no 3's
        ...
    }
}

This drop impl would suddenly misbehave during unwind.

This is indeed a compelling argument against having DrainFilter naively detect if the current thread is panicking.

The drain_filter terminology makes the most since to me. Given that we already have drain to remove all items, selecting which items to remove would be a filter. When paired together the naming feels very consistent.

The fix for the double-panic soundness issue leaves the remainder of the Vec in-tact in the event of a panic in the predicate. The items are back-shifted to fill the gap, but are otherwise left alone (and dropped via Vec::drop during unwind or otherwise handled by the user if the panic is caught).

Dropping a vec::DrainFilter prematurely continues to behave just as if it were fully consumed (which is the same as vec::Drain). If the predicate panics during vec::Drain::drop, the remaining items are back-shifted normally, but no further items are dropped and the predicate is not called again. It essentially behaves the same regardless of whether a panic in the predicate happens during normal consumption or when the vec::DrainFilter is dropped.

Assuming that the fix for soundness hole is correct, what else is holding back stabilization of this feature?

Can Vec::drain_filter be stabilized independently of LinkedList::drain_filter?

The problem with the drain_filter terminology, is that with drain_filter, there are really two outputs: the return value, and the original collection, and it isn't really clear which side the "filtered" items go on. I think that even filtered_drain is a little more clear.

it isn't really clear which side the "filtered" items go on

Vec::drain sets a precedent. You specify the range of the items that you want to _remove_. Vec::drain_filter operates the same way. You identify the items that you want to _remove_.

and it isn't really clear which side the "filtered" items go on

I’m of the opinion that this is already true for Iterator::filter, so I’ve resigned myself to having to look at the docs / write a test whenever I use that. I don’t mind the same for drain_filter.


I wish we had picked Ruby’s select and reject terminology, but that ship has long since sailed.

Any progress on this? Is the name the only thing that's still in limbo?

Is there anything blocking this from being stabilized?

It looks like DrainFilter's Drop impl will leak items if any of their destructors panic the predicate panics. This is the root cause of https://github.com/rust-lang/rust/issues/52267. Are we sure we want to stabilize an API like that?

Nevermind, that was fixed by https://github.com/rust-lang/rust/pull/61224 apparently.

Gonna poke at this tracking issue a little too, would love to see this feature hit stable :D Are there any blockers?

cc @Dylan-DPC

Was a decision ever made in favor or against having drain_filter take a RangeBounds parameter, like drain does? Passing .. seems easy enough when you want to filter the entire vector, so I'd probably be in favor of adding it.

I think this would be more versatile:

fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>

Almost, the more general version would take an FnMut(T) -> Option<U>, like Iterator::{filter_map, find_map, map_while} do. I have no idea if it's worth to generalize filter_map in this way, but it might be worth considering.

I arrived here because I was looking more more or less exactly the method @jethrogb suggested above:

fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F>
    where F: FnMut(T) -> Option<T>

The only difference from what I had in mind (which I was calling update in my head) was that I hadn't thought to make it return a draining iterator, but that seems like a clear improvement, since it provides a single, reasonably straightforward interface that supports updating items in place, removing existing items, and delivering the removed items to the caller.

Almost, the more general version would take an FnMut(T) -> Option<U>, like Iterator::{filter_map, find_map, map_while} do. I have no idea if it's worth to generalize filter_map in this way, but it might be worth considering.

The function has to return Option<T> because the values it produces are stored in a Vec<T>.

@johnw42 I'm not sure I follow, wouldn't the Some value immediately be yielded by the iterator?

Actually, I guess the input value of that function still needs to be &T or &mut T instead of T in case you don't want to drain it. Or maybe the function could be something like FnMut(T) -> Result<U, T>. But I don't see why the item type couldn't be some other type.

@timvermeulen I think we're interpreting the proposal differently.

The way I interpreted it, the Some value is stored back into the Vec, and None means the original value is yielded by the iterator. That allows the closure to either update the value in place, or move it out of the Vec. In writing this, I realized my version doesn't really add anything because you can implement it in terms of drain_filter:

fn drain_filter_map<F>(
    &mut self,
    mut f: F,
) -> DrainFilter<T, impl FnMut(&mut T) -> bool>
where
    F: FnMut(&T) -> Option<T>,
{
    self.drain_filter(move |value| match f(value) {
        Some(new_value) => {
            *value = new_value;
            false
        }
        None => true,
    })
}

Conversely, I was thinking your interpretation isn't very useful because it's equivalent to mapping the result of drain_filter, but I tried to write it, and it's not, for the same reason filter_map isn't equivalent to calling filter followed by map.

@johnw42 Ah, yeah, I thought you wanted None to mean that the value should stay in the Vec.

So it seems that FnMut(T) -> Result<U, T> would be the most general, though it's probably not very ergonomic. FnMut(&mut T) -> Option<U> isn't really an option actually because that wouldn't allow you to take ownership of the T in the general case. I think FnMut(T) -> Result<U, T> and FnMut(&mut T) -> bool are the only options.

@timvermeulen I started to say something earlier about a "most general" solution, and my "most general" solution was different from yours, but I reached the same conclusion, which is that trying to make a function too general results in a something you wouldn't actually want to use.

Although maybe there's still some value in making a very general method that advanced users can build nicer abstractions on top of. As far as I can tell, the point of drain and drain_filter isn't that they're particularly ergonomic APIs--they're not--but that they support use cases that occur in practice, and which can't be written any other way without a lot of redundant moves (or using unsafe operations).

With drain, you get the following nice properties:

  • Any contiguous selection of elements can be removed.
  • Dropping the removed items is as simple as discarding the returned iterator.
  • The removed items don't have to be dropped; the caller can examine each one individually and choose what to do with it.
  • The contents of the Vec need not support Copy or Clone.
  • No memory for the Vec itself needs to be allocated or freed.
  • Values left in the Vec are moved at most once.

With drain_filter, you gain the ability to remove an arbitrary set of items from the Vec rather than just a contiguous range. A less obvious advantage is that even if a contiguous range of items is removed, drain_filter can still offer a performance boost if finding the range to pass to drain would involve making a separate pass over the Vec to inspect its contents. Because the argument to the closure is a &mut T, it's even possible to update the items left in the Vec. Hooray!

Here are some other things you might want to do with an in-place operation like drain_filter:

  1. Transform the removed items before returning them through the iterator.
  2. Abort the operation early and report an error.
  3. Instead of just removing the item or leaving it in place (while possibly mutating it), add the ability to remove the item and replace it with a new value (which might be a clone of the original value, or something else entirely).
  4. Replace the removed item with multiple new items.
  5. After doing something with the current item, skip over some number of following items, leaving them in place.
  6. After doing something with the current item, remove some number of following items without inspecting them first.

And here's my analysis of each:

  1. This doesn't add anything useful because the caller can already transform the items as they're returned by the iterator. It also kind of defeats the purpose of the iterator, which is to avoid cloning values by delivering them to the caller only after they've been removed from the Vec.
  2. Being able to abort early could potentially improve the asymptotic complexity in some cases. Reporting an error as part of the API adds nothing new because you could do the same thing by having the closure mutate a captured variable, and it's not clear how to even do it, because the value won't be produced until after the iterator has been consumed.
  3. This, I think, adds some real generality.
  4. This is what I was originally going to propose as the "kitchen sink" option, but I decided it's not helpful, because if a single item can be replaced by multiple items, it's impossible to maintain the property that items in the Vec are moved at most once, and it might be necessary to re-allocate the buffer. If you need to do something like that, it's not necessarily more efficient than just building a whole new Vec, and it could be worse.
  5. This could be helpful if the Vec is organized in such a way that you can just skip over a large portion of the items without stopping to inspect them. I didn't include it in my sample code below, but it could be supported by changing the closure to return an additional usize specifying how many of the following items to jump over before continuing.
  6. This seems complementary to item 5, but it's not very useful if you still need to return the removed items through the iterator. It might still be a useful optimization, though, if the items you've removing have no destructor and you just want to make them disappear. In that case case, the usize above could be replaced with a choice of Keep(usize) or Drop(usize) (where Keep(0) and Drop(0) are semantically equivalent).

I think we can support the essential use cases by having the closure return an enum with 4 cases:

fn super_drain(&mut self, f: F) -> SuperDrainIter<T>
    where F: FnMut(&mut T) -> DrainAction<T>;

enum DrainAction<T>  {
    /// Leave the item in the Vec and don't return anything through
    /// the iterator.
    Keep,

    /// Remove the item from the Vec and return it through the
    /// iterator.
    Remove,

    /// Remove the item from the Vec, return it through the iterator,
    /// and swap a new value into the location of the removed item.
    Replace(T),

    /// Leave the item in place, don't return any more items through
    /// the iterator, and don't call the closure again.
    Stop,
}

One last option I'd like to present is to get rid of the iterator entirely, pass items to the closure by value, and allow the caller to leave an item unchanged by replacing it with itself:

fn super_drain_by_value(&mut self, f: F)
    where F: FnMut(T) -> DrainAction<T>;

enum DrainAction<T>  {
    /// Don't replace the item removed from the Vec.
    Remove,

    /// Replace the item removed from the Vec which a new item.
    Replace(T),

    Stop,
}

I like this approach best because it's simple and it supports all the same use cases. The potential downside is that even if most of the items are left in place, they still need to be moved into the closure's stack frame and then moved back when the closure returns. One would hope those moves could be reliably optimized away when the closure just returns its argument, but I'm not sure if that's something we should count on. If other people like it enough to include it, I think update would be a good name for it, because if I'm not mistaken, it can be used to implement any single-pass in-place update of a Vec's contents.

(BTW, I completely ignored linked lists above because I forgot about them entirely until I looked at the title of this issue. If we're talking about a linked list, it changes the analysis of points 4-6, so I think a different API would be appropriate for linked lists.)

@johnw42 you can already do 3. if you have a mutable reference, by using mem::replace or mem::take.

@johnw42 @jplatte

(3) only really makes sense if we allow the item type of the returned Iterator to be different from the item type of the collection.
(3) is a special case, because you both return the element from the Iterator, and put a new element back into the Vec.

Bikeshedding: I would kinda reverse the function of Replace(T) and replace it with PushOut(T), with the purpose to "submit" the inner value of PushOut onto the iterator, while keeping the original (parameter) item in the Vec.

Stop should probably carry the ability to return an Error type (or work a bit like try_fold?).

I implemented my super_drain_by_value function last night, and I learned several things.

The headline item should probably be that, at least w.r.t. Vec, everything we're talking about here is in the "nice to have" category (as opposed to adding a fundamentally new capability), because Vec already essentially provides direct read and write access to all of its fields through the existing API. In the stable version, there's a small caveat that you can't observe the pointer field of an empty Vec, but the unstable into_raw_parts method removes that restriction. What we're really talking about is expanding the set of operations that can be performed efficiently by safe code.

In terms of code generation, I found that in the easy cases (e.g. Vec<i32>), redundant moves in and out of the Vec are a non-issue, and calls that amount to simple things like a no-op or truncating the Vec are transformed into code that's too simple to be improved upon (zero and three instructions, respectively). The bad news is that for harder cases, both my proposal and and the drain_filter method do a lot of unnecessary copying, largely defeating the purpose of the methods. I tested this by looking at the assembly code generated for a Vec<[u8; 1024]>, and in both cases, each iteration has two calls to memcpy that aren't optimized away. Even a no-op call ends up copying the entire buffer twice!

In terms of ergonomics, my API, which looks pretty nice at first glance, isn't so nice in practice; returning an enum value from the closure becomes pretty verbose in all but the simplest cases, and the variant I proposed where the closure returns a pair of enum values is even uglier.

I also tried extending DrainAction::Stop to carry an R value that's returned from super_drain_by_value as an Option<R>, and that's even worse, because in the (presumably typical) case where the returned value isn't needed, the compiler can't infer R and you have to explicitly annotate the type of a value you're not even using. For this reason, I don't think it's a good idea to support returning a value from the closure to the caller of super_drain_by_value; it's roughly analogous to why a loop {} construct can return a value, but any other kind of loop evaluates to ().

Regarding generality, I realized the there are actually two cases for premature termination: one where the rest of the Vec is dropped, and another there it's left in place. If premature termination doesn't carry a value (as I think it shouldn't), it becomes semantically equivalent to returning Keep(n) or Drop(n), where n is the number of items not yet examined. I do, however, think premature termination should be treated as a separate case, because compared to using Keep/Drop, it's easier to use goes through a simpler code path.

To make the API a bit more friendly, I think a better option would be to make the closure return () and pass it a helper object (which I'll refer to here as an "updater") that can be used to inspect each element of the Vec and control what happens to it. These methods can have familiar names like borrow, borrow_mut, and take, with extra methods like keep_next(n) or drop_remainder(). Using this kind of API, the closure is much simpler in simple cases and no more complex in the complex cases. By having most of the updater's methods take self by value, it's easy to prevent the caller from doing things like calling take more than once, or giving conflicting instructions for what to do in subsequent iterations.

But we can still do better! I realized this morning that, as is so often the case, this problem is analogous to one that has been definitively solved in functional languages, and we can solve it with an analogous solution. I'm talking about "zipper" APIs, first described in this short paper with sample code in OCaml, and described here with Haskell code and links to other relevant papers. Zippers provide a very general way to traverse a data structure and update it "in place" using whatever operations that particular data structure supports. Another way to think of it is that a zipper is a sort of turbocharged iterator with extra methods for performing operations on a specific type of data structure.

In Haskell, you get "in place" semantics by making the zipper a monad; in Rust, you can do the same thing using lifetimes by having the zipper hold a mut reference to the Vec. A zipper for a Vec is very similar to the updater I described above, except that instead of passing it repeatedly to a closure, the Vec just provides a method to create a zipper on itself, and the zipper has exclusive access to the Vec as long as it exists. The caller then becomes responsible to writing a loop to traverse the array, calling a method at each step to either remove the current item from the Vec, or leave it in place. Early termination can be implemented by calling a method that consumes the zipper. Because the loop is under the caller's control, it becomes possible to do things like process more than one item in each iteration of the loop, or handle a fixed number of items without using a loop at all.

Here's a very contrived example showing some of the things a zipper can do:

/// Keep the first 100 items of `v`.  In the next 100 items of `v`,
/// double the even values, unconditionally keep anything following an
/// even value, discard negative values, and move odd values into a
/// new Vec.  Leave the rest of `v` unchanged.  Return the odd values
/// that were removed, along with a boolean flag indicating whether
/// the loop terminated early.
fn silly(v: &mut Vec<i32>) -> (bool, Vec<i32>) {
    let mut odds = Vec::new();
    // Create a zipper, which get exclusive access to `v`.
    let mut z = v.zipper();
    // Skip over the first 100 items, leaving them unchanged.
    z.keep_next(100);
    let stopped_early = loop {
        if let Some(item /* &mut i32 */) = z.current_mut() {
            if *item < 0 {
                // Discard the value and advance the zipper.
                z.take();
            } else if *item % 2 == 0 {
                // Update the item in place.
                *item *= 2;

                // Leave the updated item in `v`.  This has the
                // side-effect of advancing `z` to the next item.
                z.keep();

                // If there's another value, keep it regardless of
                // what it is.
                if z.current().is_some() {
                    z.keep();
                }
            } else {
                // Move an odd value out of `v`.
                odds.push(z.take());
            }
            if z.position() >= 200 {
                // This consumes `z`, so we must break out of the
                // loop!
                z.keep_rest();
                break true;
            }
        } else {
            // We've reached the end of `v`.
            break false;
        }
    }
    (stopped_early, odds)

    // If the zipper wasn't already consumed by calling
    // `z.keep_rest()`, the zipper is dropped here, which will shift
    // the contents of `v` to fill in any gaps created by removing
    // values.
}

For comparison, here's more or less the same function using drain_filter, except it only pretends to stop early. It's about the same amount of code, but IMHO it's a lot harder to read because the meaning of the value returned by the closure isn't obvious, and it's using mutable boolean flags to carry information from one iteration to the next, where the zipper version achieves the same thing with control flow. Because removed items are always yielded by the iterator, we need a separate filter step to remove negative values from the output, which means we need to check for negative values in two places instead of one. It's also kind of ugly that it has to keep track of the position in v; the implementation of drain_filter has that information, but the caller has no access to it.

fn drain_filter_silly(v: &mut Vec<i32>) -> (bool, Vec<i32>) {
    let mut position: usize = 0;
    let mut keep_next = false;
    let mut stopped_early = false;
    let removed = v.drain_filter(|item| {
        position += 1;
        if position <= 100 {
            false
        } else if position > 200 {
            stopped_early = true;
            false
        } else if keep_next {
            keep_next = false;
            false
        } else if *item >= 0 && *item % 2 == 0 {
            *item *= 2;
            false
        } else {
            true
        }
    }).filter(|item| item >= 0).collect();
    (stopped_early, removed)
}

@johnw42 Your earlier post reminded me of the scanmut crate, specifically the Remover struct, and the "zipper" concept you mentioned seems very similar! That does seem a lot more ergonomic than a method that takes a closure for when you want total control.

Either way, this is probably not very relevant to whether drain_filter should be stabilized, since we can always swap out the internals later. drain_filter itself will always be very useful because of how convenient it is. The only change I'd still like to see before stabilization is a RangeBounds parameter.

@timvermeulen I think it makes sense to add a RangeBounds parameter, but keep the current closure signature (F: FnMut(&mut T) -> bool).
You can always post-process the drained elements with filter_map or whatever you want.
(For me, it's very important that the closure allows mutating the element, because retain doesn't allow it (it was stabilized before this mistake was discovered).)

Yeah, that seems to be the perfect balance between convenience and usefulness.

@timvermeulen Yeah, I was straying rather far off the main topic.

One thing I noticed that is relevant to the original topic is that it's kind of hard to remember what the return value of the closure means--is it saying whether to keep the item or to remove it? I think it would be helpful for the docs to point out that v.drain_filter(p) is equivalent to v.iter().filter(p) with side-effects.

With filter, using a boolean value is still less than ideal for clarity, but it's a very well-known function, and IMHO it's at least somewhat intuitive that the predicate answers the question "should I keep this?" rather than "should I discard this?" With drain_filter, the same logic applies if you think about it from the perspective of the iterator, but if you think about it from the perspective of the input Vec, the question becomes "should I NOT keep this?"

As for the exact wording, I propose renaming the filter parameter to predicate (to match Iterator::filter) and adding this sentence somewhere in the description:

To remember how the return value of predicate is used, it may be helpful to keep in mind that drain_filter is identical to Iterator::filter with the additional side-effect of removing the selected items from self.

@johnw42 Yes, good point. I think a name like drain_where would be much clearer.

If you are going to get into naming bikeshedding; please make sure you’ve read all the comments; even hidden ones. Many variants have been proposed already, e.g. https://github.com/rust-lang/rust/issues/43244#issuecomment-331559537

But… it has to be named draintain()! No other name is as beautiful!

I'm quite interested in this issue, and I read the whole thread, so I might as well try to summarize what everyone said, in the hopes of helping this get stabilized. I've added some of my own comments along the way, but I've tried to keep them as neutral as possible.

Naming

Here is a non-opinionated summary of the names I saw proposed:

  • drain_filter: The name used in the current implementation. Consistent with other names suchs as filter_map. Has the advantage of being analogous to drain().filter(), but with more side-effects.
  • drain_where: Has the benefit of indicating whether true results in draining _out_ or filtering _in_, which can be hard to remember with other names. There is no precedent in std for the _where suffix, but there's plenty of precedents for similar suffixes.
  • A variation of drain().where(), since where is already a keyword.
  • drain_retain: Consistent with retain, but retain and drain have opposite interpretations of the boolean values returned by the closure, which may be confusing.
  • filtered_drain
  • drain_if
  • drain_when
  • remove_if

Parameters

It might be worth adding a range argument for consistency with drain.

Two closure formats have been suggested, FnMut(&mut T) -> bool and FnMut(T) -> Result<T, U>. The latter is more flexible, but also clumsier.

Reversing the boolean condition (true means "keep in the Vec") to be consistent with retain was discussed, but then it wouldn't be consistent with drain (true means "drain from the Vec").

Unwinding

When the filter closure panics, the DrainFilter iterator is dropped. The iterator then should finish draining the Vec, but to do so it must call the filter closure again, risking a double panic. There are some solutions, but all of them are compromises:

  • Don't finish draining on drop. This is pretty counterintuitive when used with adaptors such as find or all. Besides, it renders the v.drain_filter(...); idiom useless since iterators are lazy.

  • Always finish draining on drop. This risks double panics (which result in aborts), but makes behaviour consistent.

  • Only finish draining on drop if not currently unwinding. This fixes double panics entirely, but makes the behaviour of drain_filter unpredictable: dropping DrainFilter in a destructor might _sometimes_ not actually do its job.

  • Only finish draining on drop if the filter closure did not panic. This is the current compromise made by drain_filter. A nice property of this approach is that panics in the filter closure "short-circuit", which arguably is quite intuitive.

Note that the current implementation is sound and never leaks as long as the DrainFilter struct is dropped (although it may cause an abort). Previous implementations were not safe/leak-free though.

Drain-on-drop

DrainIter could either finish draining the source vector when it's dropped, or it could only drain when next is called (lazy iteration).

Arguments in favor of drain-on-drop:

  • Consistent with the behaviour of drain.

  • Interacts well with other adaptors such as all, any, find, etc...

  • Enables the vec.drain_filter(...); idiom.

  • The lazy functionality could be explicitly enabled through drain_lazy-style methods or a lazy() adapter on DrainIter (and even on Drain, since it's backwards-compatible to add methods).

Arguments in favor of lazy iteration:

  • Consistent with almost all other iterators.

  • The "drain-on-drop" functionality could be explicitly enabled through adapters on DrainIter, or even through a general Iterator::exhausting adapter (see RFC #2370).

I might have missed some stuff, but at least I hope it helps newcomers when skimming the thread.

@negamartin

Wouldn't the drain-on-drop option require that the iterator returns a reference to the item instead of the owned value? I think that would make it impossible to use drain_filter as a mechanism to remove and take ownership of items matching a specific condition (which was my original use-case).

I don't think so, since the behaviour of the current implementation is precisely drain-on-drop while yielding owned values. Either way, I don't see how drain-on-drop would require borrowing elements, so I think we have two different ideas on what drain-on-drop means.

Just to be clear, when I say drain-on-drop I only mean the behaviour when the iterator is not fully consumed: Should all items matching the closure be drained even if the iterator isn't fully consumed? Or only up to the element that was consumed, leaving the rest untouched?

In particular, it's the difference between:

let mut v = vec![1, 5, 3, 6, 4, 7];
v.drain_where(|e| *e > 4).find(|e| *e == 6);

// Drain-on-drop
assert_eq!(v, &[1, 3, 4]);

// Lazy
assert_eq!(v, &[1, 3, 4, 7]);

Just throwing out an idea, but another possible API could be something like:

 fn drain_filter_into<F, D>(&mut self, filter: F, drain: D)
        where F: FnMut(&mut T) -> bool, 
                   D: Extend<T>
    { ... }

It's less flexible than the other options, but does avoid the problem of what to do when DrainFilter is dropped.

It feels to me like this whole thing is looking to me less and less like retain_mut() (retain() with a mutable reference passed to the closure), which is what it was first and foremost intended to provide. Could we provide retain_mut() for now in addition to working on the design of filtered drain thing? Or am I missing something?

@BartMassey

which is what it was first and foremost intended to provide.

I don't think that's the case. I specifically use drain_filter to take ownership of items based on the filter criteria. Drain and DrainFilter yeild the item where as retain does not.

@negamartin

Just to be clear, when I say drain-on-drop I only mean the behaviour when the iterator is not fully consumed

Ah, Ok. That's my mistake. I misunderstood your definition. I had interpreted it as "nothing is removed from the vec until dropped", which doesn't really make any sense.

Arguments in favor of lazy iteration

I think it needs be consistent with drain. The Iterator::exhausting RFC was not accepted and it would be very odd if drain and drain_filter had seemingly opposite drain behaviors.

@negamartin

drain_filter: The name used in the current implementation. Consistent with other names suchs as filter_map. Has the advantage of being analogous to drain().filter(), but with more side-effects.

It is not analogous (that's why we need retain_mut/drain_filter):
drain().filter() would drain even those elements for which the filter closure returns false!

I just noticed a small line in a comment by the lib team in #RFC 2870:

Possibilities might include "overloading" a method by making it generic, or a builder pattern.

Is it backwards-compatible to make a method generic if it still accepts the previous concrete type? If so, I believe that would be the best way forward.

(The builder pattern is a bit unintuitive with iterators, since methods on iterators are usually adaptors, not behaviour-changers. Besides, there are no precedents, for example chunks and chunks_exact are two separate methods, not a chunks().exact() combo.)

No, not with the current design as far as I know as type inference which
worked before could now fail due to type ambiguity. Genetics with default
types for functions would help but are super tricky to do right.

On Fri, Jun 12, 2020, 21:21 negamartin notifications@github.com wrote:

I just noticed a small line in a comment by the lib team in #RFC 2870
https://github.com/rust-lang/rfcs/pull/2369:

Possibilities might include "overloading" a method by making it generic,
or a builder pattern.

Is it backwards-compatible to make a method generic if it still accepts
the previous concrete type? If so, I believe that would be the best way
forward.

(The builder pattern is a bit unintuitive with iterators, since methods on
iterators are usually adaptors, not behaviour-changers. Besides, there are
no precedents, for example chunks and chunks_exact are two separate
methods, not a chunks().exact() combo.)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-643444213,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB2HJELPWXNXJMX2ZDA6F63RWJ53FANCNFSM4DTDLGPA
.

Is it backwards-compatible to make a method generic if it still accepts the previous concrete type? If so, I believe that would be the best way forward.

No, since it breaks type inference in some cases. E.g. foo.method(bar.into()) will work with a conrete type argument, but not with a generic argument.

I think drain_filter as it is implemented now is very useful. Could it be stabilized as is? If better abstractions are discovered in the future, nothing stops them from being introduced as well.

What process do I need to initiate to try to get retain_mut() added, independent of whatever happens with drain_filter()? It seems to me that the requirements have diverged, and that retain_mut() would still be useful to have regardless of what happens with drain_filter().

@BartMassey for new unstable library APIs, I think making a PR with an implementation should be fine. There are instructions at https://rustc-dev-guide.rust-lang.org/implementing_new_features.html for implementing the feature and at https://rustc-dev-guide.rust-lang.org/getting-started.html#building-and-testing-stdcorealloctestproc_macroetc for testing your changes.

I've been struggling with API differences between HashMap and BTreeMap today and I just wanted to share a word of caution that I think it's important that various collections strive to maintain a coherent API whenever in makes sense, something which at this point isn't always the case.

For instance String, Vec, HashMap, HashSet, BinaryHeap and VecDeque have a retain method, but LinkedList and BTreeMap do not. I find it especially odd since retain seems like a more natural method for a LinkedList or a Map than for vectors where random deletion is a very expensive operation.

And when you dig a bit deeper it's even more perplexing: HashMap::retain's closure is passed the value in a mutable reference, but the other collections get an immutable reference (and String gets a simple char).

Now I see that new APIs like drain_filter are being added that 1/ seem to overlap with retain and 2/ are not stabilized for all collections at the same time:

  • HashMap::drain_filter is in the upstream repo but not yet shipped with Rust's std AFAIK (it doesn't appear in the docs)
  • BTreeMap::drain_filter, Vec::drain_filter, LinkedList::drain_filter are in Rust's std, but feature gated
  • VecDeque::drain_filter doesn't appear to exist at all, it doesn't appear in the docs
  • String::drain_filter doesn't exist either

I don't have a strong opinion on the best way to implement these features, or if we need drain_filter, retain or both, but I very strongly believe that these APIs should remain consistent across collections.

And perhaps more importantly, similar methods of different collections should have the same semantics. Something that the current implementations of retain violate IMO.

Was this page helpful?
0 / 5 - 0 ratings