This repository is the offical implementation for "Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation."[IEEE TGRS] [arXiv]
The code has been verified to work with PyTorch v1.12.1 and Python 3.7.
- Clone this repository.
- Change directory to root of this repository.
- Create a new Conda environment with Python 3.7 then activate it:
conda create -n FIANet python==3.7
conda activate FIANet
- Install PyTorch v1.12.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
- Install the packages in
requirements.txt
viapip
:
pip install -r requirements.txt
- Create the
./pretrained_weights
directory where we will be storing the weights.
mkdir ./pretrained_weights
- Download pre-trained classification weights of
the Swin Transformer,
and put the
pth
file in./pretrained_weights
. These weights are needed for training to initialize the visual encoder. - Download BERT weights from HuggingFace’s Transformer library, and put it in the root directory.
We perform the experiments on two dataset including RefSegRS and RRSIS-D.
We use one GPU to train our model. For training on RefSegRS dataset:
python train.py --dataset refsegrs --model_id FIANet --epochs 60 --lr 5e-5 --num_tmem 1
For training on RRSIS-D dataset:
python train.py --dataset rrsisd --model_id FIANet --epochs 40 --lr 3e-5 --num_tmem 3
For RefSegRS dataset:
python test.py --swin_type base --dataset refsegrs --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 1
For RRSIS-D dataset:
python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 3
If you find this code useful for your research, please cite our paper:
@article{lei2024exploring,
title={Exploring fine-grained image-text alignment for referring remote sensing image segmentation},
author={Lei, Sen and Xiao, Xinyu and Li, Heng-Chao and Shi, Zhenwei and Zhu, Qing},
journal={arXiv preprint arXiv:2409.13637},
year={2024}
}
Code in this repository is built on RMSIN and LAVT. We'd like to thank the authors for open sourcing their project.