Skip to content

Pytorch implementation of MAPPO algo in CleanRL style

Notifications You must be signed in to change notification settings

james116blue/cleanmarl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CleanMARL (Clean Implementation of MARL Algorithms)

Based on philosophy of this project:

CleanMARL is a Deep MultiAgent Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features.

Algorithms Implemented

Algorithm Variants Implemented
MultiAgent Proximal Policy Gradient (MAPPO) mappo_mpe.py

reward on mpe simple_spread_v3 env

Implementation features
  1. Env state based on concatanation of local observations as input to critic
  2. Huber loss for critic (value) network
  3. Value normalization

Authors don`t elaborate on math related to value normalization, but actualy it was done in the next manner (clip on minimum value to exclude zeros omitted)

$$\text{mean} = \mathop{\mathbb{E}}[R]$$ $$\text{meansq} = \mathop{\mathbb{E}}[R^2]$$ $$\beta\text{-debiasing term }$$ $$\text{mean}_t = w*\text{mean}_{t-1} + (1-w)*\text{minibatch.mean()}$$ $$\text{meansq}_t =w*\text{meansq}_{t-1} + (1-w)*\text{minibatch.mean()}^2$$ $$\beta_t=w\beta_{t-1} + (1-w)*1$$ $$v_\text{normalized} = \frac{v - \text{mean}/\beta}{\text{meansq}/\beta- \text{mean}^2}$$

About

Pytorch implementation of MAPPO algo in CleanRL style

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages