[Question] Cannot reproduce results of "EvalCallback" gathered during training. #2036

felix-basiliskroko · 2024-11-07T13:17:55Z

❓ Question

During Training I wrap my custom Gymnasium Environment in the EvalCallback wrapper to record the performance of my agent when actions are decided deterministically:

eval_env = make_vec_env(env_id=env_id, seed=42)
eval_callback = EvalCallback(eval_env, best_model_save_path=f"./{check_root_dir}/{run}/{mod}",
                 log_path=f"./{check_root_dir}/{run}/{mod}", eval_freq=20_000,
                 deterministic=True, render=False, n_eval_episodes=10)

...

model.learn(total_timesteps=2_000_000, callback=eval_callback)

During training, the eval/mean_reward converges to approximately -10.0, so I had a look at the _on_step method of EvalCallback to reproduce these score and visualise what exactly the agent has learned:

vec_env = make_vec_env(env_id=env_id, n_envs=1, seed=42)
model = PPO("MultiInputPolicy", env=vec_env)
model.load(model_path, deterministic=deterministic)
episode_rewards, _ = evaluate_policy(model, vec_env, n_eval_episodes=10, render=False, deterministic=True, return_episode_rewards=True)
mean_reward = np.mean(episode_rewards)

Where I have triple-checked that the model that is being loaded is the same as the one saved in EvalCallback, the same deterministic- and return_episode_rewards-flag is set and even that the seed for both environments is the same. But still:

print(mean_reward) -> -500.0

Which is so far off the evaluated mean_reward during training that something must be off and cannot simply be attributed to stochasticity in the environment and normal deviation from the mean.

I have tried everything I could have thought of and I can't seem to figure out where this difference comes from. Would that indicate that something in my custom environment could cause the discrepancy or am I missing a crucial detail?

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

amabilee · 2024-11-07T14:33:31Z

Hey there!

Given that the discrepancy is so large, it does suggest there might be an issue with your custom environment or the way it's being handled during evaluation.

Ensure that the environment is being reset correctly before each evaluation episode. Any residual state from previous episodes could affect the evaluation.
Verify that the action and observation spaces are identical between the training and evaluation environments. Any differences could lead to unexpected behavior.
Double-check the reward calculation logic in your custom environment. Ensure that it's consistent and correctly implemented in both training and evaluation modes.
Make sure that any randomness in your environment (e.g., initial states, stochastic transitions) is controlled or eliminated during evaluation to ensure deterministic behavior.
If you're using any wrappers in your evaluation environment, ensure they are identical to those used during training. Even subtle differences can lead to significant discrepancies.

araffin · 2024-11-07T14:51:05Z

Duplicate of #928 (comment) and others

felix-basiliskroko added the question Further information is requested label Nov 7, 2024

araffin added duplicate This issue or pull request already exists RTFM Answer is the documentation labels Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Cannot reproduce results of "EvalCallback" gathered during training. #2036

[Question] Cannot reproduce results of "EvalCallback" gathered during training. #2036

felix-basiliskroko commented Nov 7, 2024

amabilee commented Nov 7, 2024

araffin commented Nov 7, 2024

[Question] Cannot reproduce results of "EvalCallback" gathered during training. #2036

[Question] Cannot reproduce results of "EvalCallback" gathered during training. #2036

Comments

felix-basiliskroko commented Nov 7, 2024

❓ Question

Checklist

amabilee commented Nov 7, 2024

araffin commented Nov 7, 2024