You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model saved periodically do not match the Reward on training window
I have a question when I was checking my training result. I am using a custom gym environment, and PPO algorithm from SB3.
During training, I save the model periodically in order to see how the model is evolving. And during learning, I also set the verbose=1 to keep track on the training progress. However, when I look at my temporary model I save periodically, the reward of those models do not have the same reward as the time they were saved.
For example, I saved "model_1" at timesteps=10,000 using a custom callback function. At the same time, the training windows showed "ep_rew_mean=366 " at timesteps=10,000. However, when I test "model_1" individually, the reward of this is 200. During the testing, I set model.predict(obs,deterministic = True). I wonder why this will happen, and is this cause by my callback function?
Moreover, my final model also do not have the same reward as the training window.
Here is my code for custom callback function:
classSaveOnModelCallback(BaseCallback):
""" Callback for saving a model (the check is done every ``check_freq`` steps) based on the training reward (in practice, we recommend using ``EvalCallback``). :param check_freq: (int) :param log_dir: (str) Path to the folder where the model will be saved. It must contains the file created by the ``Monitor`` wrapper. :param verbose: (int) """def__init__(self, check_freq: int, log_dir: str, verbose=1):
super(SaveOnModelCallback, self).__init__(verbose)
self.check_freq=check_freqself.log_dir=log_dirself.save_path=os.path.join(log_dir, 'best_model')
def_init_callback(self) ->None:
# Create folder if neededifself.save_pathisnotNone:
os.makedirs(self.save_path, exist_ok=True)
def_on_step(self) ->bool:
ifself.n_calls%self.check_freq==0:
count=self.n_calls//self.check_freqstr1='Tempmodel'print(f"Num timesteps: {self.num_timesteps}")
print(f"Saving model to {self.save_path}.zip")
self.model.save(str1+str(count))
returnTrue
The text was updated successfully, but these errors were encountered:
hotpotking-lol
changed the title
Callback collected model does not have same reward as training verbose
[Question]Callback collected model does not have same reward as training verbose[custom gym environment]
Aug 16, 2022
Hello,
this is SB2 repo, not SB3.
Anyway, please give a minimal code example to reproduce the issue and you can also search for similar issues on the repo.
During training, the stochastic policy is used and return is averaged over 100 episodes (you should evaluate at least on 100 episodes with stochastic policy to be in the same setting, take also a look at the variance).
Model saved periodically do not match the Reward on training window
I have a question when I was checking my training result. I am using a custom gym environment, and PPO algorithm from SB3.
During training, I save the model periodically in order to see how the model is evolving. And during learning, I also set the verbose=1 to keep track on the training progress. However, when I look at my temporary model I save periodically, the reward of those models do not have the same reward as the time they were saved.
For example, I saved "model_1" at timesteps=10,000 using a custom callback function. At the same time, the training windows showed "ep_rew_mean=366 " at timesteps=10,000. However, when I test "model_1" individually, the reward of this is 200. During the testing, I set model.predict(obs,deterministic = True). I wonder why this will happen, and is this cause by my callback function?
Moreover, my final model also do not have the same reward as the training window.
Here is my code for custom callback function:
The text was updated successfully, but these errors were encountered: