[Feature Request] Add a next_observations field to RolloutBufferSamples #1328

euanong · 2023-02-12T18:53:12Z

🚀 Feature

When sampling from a RolloutBuffer, we return RolloutBufferSamples containing tensors of observations, actions etc.

stable-baselines3/stable_baselines3/common/buffers.py

Lines 473 to 479 in 69b94dd

    
           def _get_samples( 
        
               self, 
        
               batch_inds: np.ndarray, 
        
               env: Optional[VecNormalize] = None, 
        
           ) -> RolloutBufferSamples:  # type: ignore[signature-mismatch] #FIXME 
        
               data = ( 
        
                   self.observations[batch_inds],

It would be nice if RolloutBufferSamples could also contain a batch of next observations (alongside a mask that, for each observation, tells us whether that observation has a successor).

Motivation

I'm implementing an RL pipeline in which I extend PPO with a custom loss. For this custom loss, I need access to (observation, next observation) pairs.

In the PPO implementation

stable-baselines3/stable_baselines3/ppo/ppo.py

Lines 192 to 197 in 69b94dd

    
           # train for n_epochs epochs 
        
           for epoch in range(self.n_epochs): 
        
               approx_kl_divs = [] 
        
               # Do a complete pass on the rollout buffer 
        
               for rollout_data in self.rollout_buffer.get(self.batch_size): 
        
                   actions = rollout_data.actions

each batch of rollout data over which we compute the PPO loss is a RolloutBufferSample -- and, as these consist of a random subset of observations from the RolloutBuffer, we do not have enough information to compute the next observation for each observation in the batch.

Pitch

I have already implemented this feature and submitted it as a PR [to be linked after submission].

Alternatives

Alternatively, we could return the indices of the sampled elements with respect to the original buffer. While this may allow for more general buffer manipulation, this feels less pleasant to use.

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo

The text was updated successfully, but these errors were encountered:

araffin · 2023-02-13T08:53:50Z

I have checked that there is no similar issue in the repo

Duplicate of #1273

euanong added the enhancement New feature or request label Feb 12, 2023

euanong mentioned this issue Feb 12, 2023

Added a next_observations field to RolloutBufferSamples (closes #1328) #1329

Open

16 tasks

araffin closed this as not planned Won't fix, can't repro, duplicate, stale Feb 13, 2023

araffin added the duplicate This issue or pull request already exists label Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add a next_observations field to RolloutBufferSamples #1328

[Feature Request] Add a next_observations field to RolloutBufferSamples #1328

euanong commented Feb 12, 2023

araffin commented Feb 13, 2023

[Feature Request] Add a next_observations field to RolloutBufferSamples #1328

[Feature Request] Add a next_observations field to RolloutBufferSamples #1328

Comments

euanong commented Feb 12, 2023

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

Checklist

araffin commented Feb 13, 2023