Skip to content
This repository has been archived by the owner on Oct 8, 2024. It is now read-only.

A flaw in JavaScript iteration leaves a weird behavior with take, first, some, etc. #293

Closed
benlesh opened this issue Feb 20, 2024 · 19 comments

Comments

@benlesh
Copy link

benlesh commented Feb 20, 2024

TL;DR: Just make sure that take, some, first, etc call return() on their source before yielding their last value

This is just something I wanted to draw to your attention because it's been the source of edge case bugs in observable which is the "dual" of iterable:

Basically, as a principle, a consumer should not be allowed to act on anything from the producer before that producer finalizes (if that producer knows it can finalize). The basic examples of this are all over for "done" (completion/error) paths, but for values it can't be done in JavaScript in a straight forward way because of our avante garde single-step iteration contract.

Put another way, it's impossible for the consuming code to act on the information that iteration is done before the iterable itself cleans up:

function* producer() {
  try {
    yield 1;
    yield 2;
    yield 3;
  } finally {
    console.log('producer cleaned up');
  }
}

function consumer() {
  for (const value of producer()) {
    console.log(value);
  }
  console.log('consumer knows producer is done');
}

// 1
// 2
// 3
// producer cleaned up
// consumer knows produce is done

And this is a good thing. RxJS is changing our notification ordering to ensure that we adhere to this.

Why take, some, first etc are a problem

The issue at hand is that in cases where we know we want to end iteration before we yield the next value, there's no way to force the finally block to be hit at the moment we yield. That gives the consumer the chance to iterate N more times, possibly causing more side effects, before the teardown occurs.

import * as fs from 'node:fs';
import * as path from 'path:fs';

function* copyFiles(directoryPath) {
  const files = fs.readdirSync(directoryPath);

  for (let file of files) {
    const originalFilePath = path.join(directoryPath, file);
    const copyFilePath = path.join(directoryPath, `${file}.copy`);

    if (fs.statSync(originalFilePath).isFile()) {
      const content = fs.readFileSync(originalFilePath);
      fs.writeFileSync(copyFilePath, content);
      yield copyFilePath;
    }
  }
}

const fileCopier = copyFiles('/some/directory');

fileCopier.take(3).forEach((copiedFile, i) => {
  console.log(`Copy: ${copiedFile}`);
  if (i === 2) {
    // on the last one, just `next()`.
    console.log(fileCopier.next());
  }
});

The problem the code above illustrates is that the source iterator for take(3) isn't finalized BEFORE take yields the value. That means that you can still next() off of the source, which is this case will even cause one more file to be copied.

This is an admittedly contrived example, but at scale with RxJS observables, which are the "dual" of iterable, for better or worse, I have seen users hit this issue for semi-valid reasons, so I'd caution you all to be too dismissive of this edge case.

The fix for this:

Just ensure that return() is called on the iteration source if it exists before yielding the last value you take from take, first, some, etc.

The root cause:

Unfortunately the ship has sailed on this, but the design of iteration in JavaScript is to blame for this issue. There's no way to yield and simultaneously finalize from within an iterable/generator. We might have used { done: true, value: 'last value' } to do that, but unfortunately, due to JavaScript's implicit undefined return, there would be no way to differentiate between function* () { yeild 1; return undefined; } and function* () { yield 1; }. Which is why for..of omits { done: true, value: undefined: } from iterated values. If JavaScript had gone with a more traditional moveNext(), current() approach to iteration, this wouldn't have been an issue. 🤷 But there's nothing that can be done there.

Related "duality" discussion: ReactiveX/rxjs#7443

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

I think this may have been discussed in #122?

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

@ljharb Not really. that was more about forwarding coroutine/generator arguments. This is more about the iteration protocol itself. The source iterator needs to be "shut down" prior to yielding the final value for any operation that "knows" when it has a final value. I don't particularly care about the arguments for generators being threaded through.

@bakkot
Copy link
Collaborator

bakkot commented Feb 20, 2024

For some this isn't an issue, because our some just returns a value rather than an iterator, so of course it must close the underlying iterator before doing so - there is no later time at which it could close.

For take, this specific question was discussed in #219. The outcome was not doing this, though I don't remember if we discussed it in committee or just in that thread.

@ljharb

This comment was marked as outdated.

@bakkot
Copy link
Collaborator

bakkot commented Feb 20, 2024

Thanks for raising this.

I find the example in the OP compelling, but trading it off against the arguments in #219 - i.e., this is "too clever" and can cause weird behavior in some cases, especially if e.g. closing an async iterator takes a while - I lean towards keeping the current "late closing" behavior, despite the potential footgun. The footgun only arises when re-using the original iterator after invoking an iterator helper on it, and that's just always going to be fraught - unlike observables, iterators are very much designed to be single-consumer. This is only one example of many where re-using the underlying iterator is going to break assumptions.

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

Yeah. I had forgotten that the committee made the choice of composing over the iterator rather than the iterable itself. Yeah... I mean, I guess since what we have is a stateful mechanism for traversing values, rather than a monad, there's probably no reason to worry about controlling the lifecycle or cleaning up anything from sources.

That's unfortunate. It would have been very, very nice to have implicit teardown from composing async iterables. These things aren't as useful without that.

Yeah. I guess this is a non-issue when we're doing take() on an iterator, because 7 different things could be iterating off of it besides the take().

async function* fromSomeSocketAPI(url) {
  const socketAPI = createSocketAPI(url);
  
  try {
    await socketAPI.connected();
    
    for await (const message of socketAPI.messages) {
      yield message;
    }
  } finally {
    if (socketAPI.connected) {
      socketAPI.close();
    }
  }
}

const s = fromSomeSocketAPI('wss://something');

s.take(3).forEach(console.log);

// LATER: Oops the socket is still open.
s.return();

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

lean towards keeping the current "late closing" behavior, despite the potential footgun.

@bakkot ... Sorry I must be misreading the other threads... I thought it was determined that take() would not call return() at all?

@bakkot
Copy link
Collaborator

bakkot commented Feb 20, 2024

I thought it was determined that take() would not call return() at all?

It does, but only on the N+1'th call.

That is:

function* nat() {
  try {
    for (let i = 0; true; ++i) yield i;
  } finally {
    console.log('closing');
  }
}

for (let item of nat().take(3)) {
  console.log('got', item);
}

prints

got 0
got 1
got 2
closing

The for of in that snippet is exactly equivalent to nat().take(3).forEach(item => console.log('got', item)), incidentally. It's maybe easier to understand if you do the next calls manually:

let it = nat().take(3);
console.log(it.next()); // prints { done: false, value: 0 }
console.log(it.next()); // prints { done: false, value: 1 }
console.log(it.next()); // prints { done: false, value: 2 }
console.log(it.next()); // prints "closing", then { done: true, value: undefined }

(You can actually try this in Chrome if you're on 122 or later, since iterator helpers are now shipping. Only sync helpers, but async helpers will do the same thing as far as the above is concerned.)

@bergus
Copy link

bergus commented Feb 20, 2024

I guess this is a non-issue when we're doing take() on an iterator, because 7 different things could be iterating off of it besides the take().

I think that sums it up pretty well.

Any iterator (chain) that is iterated by a for … of loop should by default have the non-surprising behaviour of getting closed when the loop ends, after the last evaluation of the loop body. This should hold for generator wrappers (where you yield inside a loop over another iterator) and also for the iterator helper methods, in the simple sequential single-consumer scenario.

Only if you are doing something weird, like multiple consumers, or calling .next() (or .return() or .throw()) on any iterator n
in the chain in addition to the normal loop iteration, you would run into such issues. And then you are expected to handle them yourselves, like when you want to re-use an iterator after .take()ing a few values from it and having it stay open. #71 discusses when a take iterator should be closed or stay open as well.

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

It does, but only on the N+1'th call.

Yeah... this is a behavior that is more problematic when dealing with the push-based "dual" observable, but only because people are more likely to perform side effects in push-based things

Just a heads up: If you end up adding a do (aka tap) to your iterator helpers, you'll be more likely to see issues with this, as people will be very confused by:

Iterator.from([1, 2, 3, 4, 5])
  .map(n => `${n}.txt`)
  .do(fileName => fs.writeFileSync(fileName, 'content'))
  .take(3)
  .forEach(fileName => console.log(`${fileName} written`))

Which will log three files written, but actually write 4 files.

The only way to avoid that would be to have take call return on the source iterator and force it to disconnect. Which has the unfortunate problem that the source iterator might be implemented without a return method.

And, of course { done: true, value: 'some value here' } isn't going to show up in iterated values, so sending the 3rd value from take as { done: true, value: '3.txt' } right there (when you know it's done) will not work in JavaScript for reasons I mentioned above. So we're stuck with this downside of the single-step iteration design.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2024

Why would that be confusing? You ran the .do before the .take; logging 3 and writing 4 is precisely what i'd expect.

@bakkot
Copy link
Collaborator

bakkot commented Feb 20, 2024

Which will log three files written, but actually write 4 files.

No, that'll only write 3. The contract of x = it.take(N) is that it will only pull from it a total of N times, no matter times you pull from x. Being pull-based instead of push-based makes this work.

You can simulate that today by using map instead of tap; the following code runs in Chrome 122+

Iterator.from([1, 2, 3, 4, 5])
  .map(n => `${n}.txt`)
  .map(fileName => (console.log(`writing`, fileName), fileName))
  .take(3)
  .forEach(fileName => console.log(`${fileName} written`))

and prints

writing 1.txt
1.txt written
writing 2.txt
2.txt written
writing 3.txt
3.txt written

just as you'd hope.

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

It does, but only on the N+1'th call.

@bakkot Sorry, I was mistaking what was said here to mean that the source would still iterate one more time. You're saying the source would get next() called N times, and then return once after that.

@bakkot
Copy link
Collaborator

bakkot commented Feb 20, 2024

Yup, exactly. (Assuming that the consumer of take calls next on it at least N+1 times, because the return call is only triggered by the take helper the N+1th time you call next on it.)

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

So the only real problem is just the edge case I demonstrated initially, where a user gets a handle to the source of take() and calls next() on it manually before take(N) tears it down on the N+1 call, and there happens to be a side effect above that take.

I don't think it'll happen often with Iterable, but it might more often with AsyncIterable. The best thing that was saving us before was that people rarely got a handle to an actual iterator, and mostly dealt with iterables, which start from stratch when iterated. I worry that being able to get a handle to an iterator is going to make people more likely to manually call "next" on it.

But I will say, just to be fair, the only people that had this problem with observable were being overly clever. But it did happen enough that we had to do something about it. And the only fix for it was to ensure that the source was finalized before the last value "taken" was emitted.

@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

You know what? I can't find another language that will finalize a generator/iterator before emitting the last value on take.

I'm closing this. Sorry for the noise. Hopefully, at least, when someone comes because they're doing something weird and they run into this, they'll see this discussion and know more about what's happening.

@benlesh benlesh closed this as completed Feb 20, 2024
@benlesh
Copy link
Author

benlesh commented Feb 20, 2024

(Thank you for your time)

@bakkot
Copy link
Collaborator

bakkot commented Feb 20, 2024

No problem, thanks for raising. It's good to talk these things through.

@bergus
Copy link

bergus commented Feb 21, 2024

the only fix for it was to ensure that the source was finalized before the last value "taken" was emitted.

An alternative would be to "lock" the source, and mark it as "consumed", so that only .take() can take values out of it and all other "manual" .next() calls would throw an error. In a language like Rust, this would be checked statically by the compiler with its ownership and borrowing model, but it's also possible to implement this dynamically in JavaScript.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants