Skip to content

[IEEE TGRS 2024] Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

Notifications You must be signed in to change notification settings

Shaosifan/FIANet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIANet

This repository is the offical implementation for "Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation."[IEEE TGRS] [arXiv]

Setting Up

Preliminaries

The code has been verified to work with PyTorch v1.12.1 and Python 3.7.

  1. Clone this repository.
  2. Change directory to root of this repository.

Package Dependencies

  1. Create a new Conda environment with Python 3.7 then activate it:
conda create -n FIANet python==3.7
conda activate FIANet
  1. Install PyTorch v1.12.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
  1. Install the packages in requirements.txt via pip:
pip install -r requirements.txt

The Initialization Weights for Training

  1. Create the ./pretrained_weights directory where we will be storing the weights.
mkdir ./pretrained_weights
  1. Download pre-trained classification weights of the Swin Transformer, and put the pth file in ./pretrained_weights. These weights are needed for training to initialize the visual encoder.
  2. Download BERT weights from HuggingFace’s Transformer library, and put it in the root directory.

Datasets

We perform the experiments on two dataset including RefSegRS and RRSIS-D.

Training

We use one GPU to train our model. For training on RefSegRS dataset:

python train.py --dataset refsegrs --model_id FIANet --epochs 60 --lr 5e-5 --num_tmem 1  

For training on RRSIS-D dataset:

python train.py --dataset rrsisd --model_id FIANet --epochs 40 --lr 3e-5 --num_tmem 3  

Testing

For RefSegRS dataset:

python test.py --swin_type base --dataset refsegrs --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 1 

For RRSIS-D dataset:

python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 3

Citation

If you find this code useful for your research, please cite our paper:

@article{lei2024exploring,
  title={Exploring fine-grained image-text alignment for referring remote sensing image segmentation},
  author={Lei, Sen and Xiao, Xinyu and Li, Heng-Chao and Shi, Zhenwei and Zhu, Qing},
  journal={arXiv preprint arXiv:2409.13637},
  year={2024}
}

Acknowledgements

Code in this repository is built on RMSIN and LAVT. We'd like to thank the authors for open sourcing their project.

About

[IEEE TGRS 2024] Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages