You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am implementing Soft-Actor Critic (SAC) agent and need to evaluate q-value network inside my custom environment (for the implementation of a special algorithm, called Wolpertinger's algorithm, to handle large discrete action spaces). I have tried to get the q-values from SAC class object, but failed. Any method or function like the one with stable baselines' PPO algorithm's implementation (namely, .value) will be very helpful.
The text was updated successfully, but these errors were encountered:
moizuet
changed the title
Deep Q-value network evaluation
Deep Q-value network evaluation in SAC algorithm
Jul 19, 2022
I would first suggest moving using stable-baselines3: it is more refined and still mantained. This version is no longer mantained.
To answer your question: there is no convenience function for this, but you can check how SAC does the value prediction in SB3 here, and try to replicate it yourself.
Unfortunately I have implemented rest of RL algorithms, layers and optimizers in tensorflow and stable-baselines2 ecosystem. I cannot switch right now but I will consider using stable-baselines3 and specially Rllib in the future.
Also it will be a great coding exercise for me to implement this q-value evaluation method.
I am implementing Soft-Actor Critic (SAC) agent and need to evaluate q-value network inside my custom environment (for the implementation of a special algorithm, called Wolpertinger's algorithm, to handle large discrete action spaces). I have tried to get the q-values from SAC class object, but failed. Any method or function like the one with stable baselines' PPO algorithm's implementation (namely, .value) will be very helpful.
The text was updated successfully, but these errors were encountered: