Deep Q-value network evaluation in SAC algorithm #1166

moizuet · 2022-07-19T22:18:59Z

I am implementing Soft-Actor Critic (SAC) agent and need to evaluate q-value network inside my custom environment (for the implementation of a special algorithm, called Wolpertinger's algorithm, to handle large discrete action spaces). I have tried to get the q-values from SAC class object, but failed. Any method or function like the one with stable baselines' PPO algorithm's implementation (namely, .value) will be very helpful.

Miffyli · 2022-07-20T20:46:28Z

I would first suggest moving using stable-baselines3: it is more refined and still mantained. This version is no longer mantained.

To answer your question: there is no convenience function for this, but you can check how SAC does the value prediction in SB3 here, and try to replicate it yourself.

moizuet · 2022-07-21T13:37:03Z

Unfortunately I have implemented rest of RL algorithms, layers and optimizers in tensorflow and stable-baselines2 ecosystem. I cannot switch right now but I will consider using stable-baselines3 and specially Rllib in the future.

Also it will be a great coding exercise for me to implement this q-value evaluation method.

Cheers..

moizuet changed the title ~~Deep Q-value network evaluation~~ Deep Q-value network evaluation in SAC algorithm Jul 19, 2022

Miffyli added the question Further information is requested label Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Q-value network evaluation in SAC algorithm #1166

Deep Q-value network evaluation in SAC algorithm #1166

moizuet commented Jul 19, 2022 •

edited

Loading

Miffyli commented Jul 20, 2022

moizuet commented Jul 21, 2022

Deep Q-value network evaluation in SAC algorithm #1166

Deep Q-value network evaluation in SAC algorithm #1166

Comments

moizuet commented Jul 19, 2022 • edited Loading

Miffyli commented Jul 20, 2022

moizuet commented Jul 21, 2022

moizuet commented Jul 19, 2022 •

edited

Loading