Skip to content

Conversation

@pedrodesu
Copy link
Contributor

@pedrodesu pedrodesu commented Oct 30, 2025

Handles #52

@pedrodesu
Copy link
Contributor Author

pedrodesu commented Oct 30, 2025

Furthermore using RC doesn't seem like a real possibility here. By doing this the iterator would turn read-only and non-consuming, thus not working anymore like IntoIter truly does. This implementation is quite similar to the std's.

@laurmaedje
Copy link
Member

Furthermore using RC doesn't seem like a real possibility here.

I think it's possible to use the same backing storage and clone the individual items upon iteration (a code path for that already exists). That would save the extra top-level allocation.

@pedrodesu
Copy link
Contributor Author

While cloning items during iteration could technically work, I believe it would change the semantics - IntoIter is meant to consume ownership, not share it. Sharing the backing storage means the iterator no longer truly owns its data. Just as std does with its implementations, I believe it's much better to consume the original elements, even if that means one extra allocation. Else you're deviating from how IntoIter works.

@laurmaedje
Copy link
Member

That argument doesn't really make sense to me. When there is no other clone of IntoIter, it will consume the backing EcoVec as usual. If there is another clone (either of the backing vector or the iterator), it has to clone. Whether you are cloning upfront and then consuming as you iterate or cloning as you go doesn't really make a difference, except for the extra allocation; you are cloning the elements either way.

For what it's worth, the existing .into_iter() impl already does clone as it goes if the backing EcoVec was shared:

ecow/src/vec.rs

Lines 1139 to 1151 in 9639e9f

if self.unique {
// Safety:
// - We have unique ownership over the underlying allocation.
// - The pointer returned by `data()` is valid for `len` reads.
// - We know that `prev < self.back <= len`.
// - We take ownership of the value and don't drop it again
// in our drop impl.
unsafe { ptr::read(self.vec.data().add(prev)) }
} else {
// Safety:
// - We know that `prev < self.back <= len`.
unsafe { self.vec.get_unchecked(prev).clone() }
}

Of course, std::vec::IntoIter<T> works a bit differently, but then so does Vec compared to EcoVec. I think having the iterator type be ref-counted just like the vector itself makes a lot of sense.

@pedrodesu
Copy link
Contributor Author

I believe this cannot be implemented like this. I attempted to do so, but you end up violating Rust’s move semantics:
IntoIter::clone() shares the same backing vec (same pointer), so both iterators think they can advance independently. Each call to next() uses ptr::read to extract elements from that shared memory. While ptr::read itself doesn’t modify memory, it creates a new owned T from the same bytes — and both iterators now believe they uniquely own those values.

This results in two independent drops of the same elements, which is undefined behavior. The UB checker catches it as overlapping or invalid pointer operations during subsequent moves.

To implement Clone soundly, the iterator would need to either:

  • deep-clone the remaining elements so each iterator has its own owned buffer (as I did originally) or
  • stop consuming the elements and instead clone them on yield (effectively turning it into a read-only iterator)

The latter doesn't seem like a valid option because, as I stated originally, this is NOT how a IntoIter should operate.

Anything that allows multiple IntoIter to ptr::read from the same underlying allocation will inherently violate Rust’s aliasing guarantees.

@laurmaedje
Copy link
Member

laurmaedje commented Nov 5, 2025

Certainly we can't do multiple ptr::reads on the same backing storage. That's not what I was proposing.

However, I think cloning on yield is a perfectly fine option.

The latter doesn't seem like a valid option because, as I stated originally, this is NOT how a IntoIter should operate.

I tried to argue why I believe this is okay in my previous message, but you did not really respond to my argument except for saying that it's not how it should operate.

It already happens when you do let v = eco_vec!["hello".to_string(), "world".to_string(); v.clone().into_iter(); v.into_iter(), so it's not really related to impl Clone for IntoIter. The .into_iter() should consume the EcoVec (which it does), but if the EcoVec does not have unique ownership of its items, it can't consume the items, so it needs to clone them one way or another. A consumer that relies on whether we clone upfront or while iterating is seriously cursed and if there's a use case for this (I can't really think of one), it's not something I want to support in ecow.

@pedrodesu
Copy link
Contributor Author

pedrodesu commented Nov 5, 2025

You're right that in cases where the EcoVec isn't uniquely owned, the iterator can't truly consume the elements, so it must clone them at some point — either upfront or lazily during iteration.

My concern is mainly semantic: an IntoIter that conditionally clones items no longer behaves like the standard library’s consuming iterator model. It becomes conceptually closer to a Iter than an IntoIter, since cloning on yield breaks the usual assumption that all items are moved out, and effectively would mean making the iterator read-only (to avoid UB) - much like Iter.

@laurmaedje
Copy link
Member

laurmaedje commented Nov 5, 2025

I'm not sure what you mean with "read-only". Under well-behaved usage, I think you wouldn't be able to observe any difference between EcoVec and Vec with either option. The copy-on-write optimization EcoVec performs is fully transparent and the iterators behave exactly like their std counterparts in terms of semantics.

The only way to observe when something is cloned is with interior mutability or impure clone implementations. These are things that need to be very carefully mixed with any kind of reference counting (same when using Arc).

Perhaps you could show a concrete code example that you think would behave differently under both options?

@pedrodesu
Copy link
Contributor Author

pedrodesu commented Nov 6, 2025

You can't implement IntoIterator for EcoVec<T> with "move if last iterator, else clone" semantics:

  • Using Rc and moving if it's the only clone doesn't work. Calling into_iter on two EcoVecs that point to the same memory (an original EcoVec and a clone of it) gives two independent Rcs, each thinks it's unique, and both will attempt to move the same values in memory. UB. Other cases can be given but this seems like a very efficient example.
  • Using a per-element AtomicUsize to track refs seems to be the only way to reliably know when an element is last nexted and ready to be moved, and clone otherwise. I think this is the only way to implement the logic you suggested. But this would be a huge overhead, and you'd need to store both the value and its counter. That effectively changes the structure and forces you to allocate a new vector, which is what we're trying to avoid. While this approach might be technically possible, it clearly defeats its own point.

These are some simple reasonings for the two implementations I can think of. If you avoid simple, dumb, eager cloning that is.

No edge case needs to be presented, I don't think you can implement it like so for the every generic, "stable" use.

If you see any other way to go about this let me know, I can try to implement it. But I've tried the both of these implementations already.

@laurmaedje
Copy link
Member

Thanks for the detailed reply, that makes your point clearer.

I think there are three relevant variants of the clone-on-yield idea to distinguish:

  1. All IntoIter always clone on yield instead of moving. I agree that this is bad.

  2. An IntoIter that was ever shared will clone on yield, while an IntoIter that was always unique will still move out of the vector. This is already the case when two iterators are created from two EcoVecs pointing to the same allocation. There, too, both iterators will clone and neither will move.

  3. An IntoIter always moves an item if and only if it's the last iterator that hasn't yet yielded that item. I agree that this is prohibitively expensive and complex to implement.

So (1) disqualifies because of its semantics (I think this was your main concern) and (3) disqualifies because of feasibility. For me (2), would be fine in terms of semantics. It's already how into_iter() on cloned vectors behaves and the rule would be fairly simple: "If an iterator had shared access to underlying allocation at any time, it will forever clone on yield." I'm not sure whether you would consider (2) fine in terms of semantics.

Now, the remaining question is whether (2) is feasible to implement. I believe the answer is yes, but it's possibly I'm missing something. Basically, the problematic case is when an originally unique iterator is cloned. Then, we need to make it shared, but some of the elements have already been moved out. To avoid double-dropping, we would need to have an extra field in IntoIter to save the range that hasn't yet been moved out and use this upon Drop of the latest iterator. (Naively, this would add two usizes, but I think we could reuse the existing vectors len field for the end of the range, so one usize might suffice.)

I'm not sure whether it's worth it to implement it like this. That said, I also don't like the implementation that immediately allocates new backing storage. It's not what I would intuitively expect from ecow. If we think that (2) isn't a good option, I would rather not support Clone at all. This would make it explicit that there is no perfect implementation. And calling EcoVec::from(self.as_slice()).into_iter() in user space (or creating a newtype if the iterator is used as a field) is not that bad.

@pedrodesu
Copy link
Contributor Author

pedrodesu commented Nov 6, 2025

Ah yes, now I see what you mean. The second idea might be possible to implement in theory, though I think it still leads to the problem I pictured on the first situation. If you have 2 EcoVec clones, and into_iter into both of them (Mind we're cloning the EcoVecs, not the IntoIters), both IntoIters think they're unique, but both point to the same location. As such they'll attempt to move and trigger UB.

Given this rationale is correct, I think we'll need to either give the dumb implementation or just discard it altogether. Let me know what you think.

@laurmaedje
Copy link
Member

If we have two EcoVec clones in the first place, then I'd imagine the (existing) unique field would be false from the start and no moving would happen at all (just like today).

@pedrodesu
Copy link
Contributor Author

pedrodesu commented Nov 7, 2025

So in this rationale, if we have IntoIter a and two a.clone()s called b and c, would all of them have unique: false or would a be unique but not b and c? And if the answer is the latter, what would happen if you next()ed on a before b or c? We can't clone a value that was moved.

The first idea should be doable. The second one doesn't seem so.

@laurmaedje
Copy link
Member

All would have unique: false. This is in line with what a and b have below in the currently released version of ecow:

let v1 = eco_vec![1,2,3];
let v2 = v1.clone();
let a = v1.into_iter();
let b = v2.into_iter();

I think it should work fine, it's just the question of whether the complexity and extra field are worth it... I'm not sure.

@pedrodesu
Copy link
Contributor Author

pedrodesu commented Nov 11, 2025

Isn't this worse though? If we have only a, its items will be moved. If we have b, and b = a.clone(), then both a and b will have unique: false. This means that now no "original value" will be moved, and we will clone for every yield of both a and b. While the "dumb solution" I presented initially would have one additional allocation for b's backing EcoVec, we would still be able to move a's elements directly without cloning them. Isn't this better than saving on b's EcoVec allocation but having to do one extra allocation for every element yielded by a? Am I missing something?

@laurmaedje
Copy link
Member

My point is that this is already the case regardless of your PR for ecow if you clone the vector rather than the iterator. And I don't really see a way around that (except for eagerly allocating on into_iter when the underlying vector is shared, which seems like a non-starter).

@pedrodesu
Copy link
Contributor Author

Indeed, all my initial implementation does anyway is act as a helper for cloning the vector. So you think it's better to just not implement Clone for IntoIter? I don't see a way to implement this that is doable and that allocates less than simply cloning the vector.

@laurmaedje
Copy link
Member

I think there's still a misunderstanding. I am speaking about ecow as it is released on crates.io right now, completely unrelated to your PR. If you have an EcoVec a, clone it to b and then call .into_iter() on both a and b, the items will be cloned individually and neither will move out. Then once both iterators are dropped, the items in the backing storage shared by a and b will be dropped. It don't see a way around that.

@laurmaedje
Copy link
Member

I've decided that I'd rather not support this in ecow, as the additional field and additional complexity do not feel worth it for the rather niche use case of cloning the iterator. Thank you for the discussion and the work you've put into this, still!

@laurmaedje laurmaedje closed this Nov 14, 2025
@pedrodesu pedrodesu deleted the clone-intoiter branch November 14, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants