FIANet

This repository is the offical implementation for "Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation."[IEEE TGRS] [arXiv]

Setting Up

Preliminaries

The code has been verified to work with PyTorch v1.12.1 and Python 3.7.

Clone this repository.
Change directory to root of this repository.

Package Dependencies

Create a new Conda environment with Python 3.7 then activate it:

conda create -n FIANet python==3.7
conda activate FIANet

Install PyTorch v1.12.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch

Install the packages in requirements.txt via pip:

pip install -r requirements.txt

The Initialization Weights for Training

Create the ./pretrained_weights directory where we will be storing the weights.

mkdir ./pretrained_weights

Download pre-trained classification weights of the Swin Transformer, and put the pth file in ./pretrained_weights. These weights are needed for training to initialize the visual encoder.
Download BERT weights from HuggingFace’s Transformer library, and put it in the root directory.

Datasets

We perform the experiments on two dataset including RefSegRS and RRSIS-D.

Training

We use one GPU to train our model. For training on RefSegRS dataset:

python train.py --dataset refsegrs --model_id FIANet --epochs 60 --lr 5e-5 --num_tmem 1

For training on RRSIS-D dataset:

python train.py --dataset rrsisd --model_id FIANet --epochs 40 --lr 3e-5 --num_tmem 3

Testing

For RefSegRS dataset:

python test.py --swin_type base --dataset refsegrs --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 1

For RRSIS-D dataset:

python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 3

Citation

If you find this code useful for your research, please cite our paper:

@article{lei2024exploring,
  title={Exploring fine-grained image-text alignment for referring remote sensing image segmentation},
  author={Lei, Sen and Xiao, Xinyu and Li, Heng-Chao and Shi, Zhenwei and Zhu, Qing},
  journal={arXiv preprint arXiv:2409.13637},
  year={2024}
}

Acknowledgements

Code in this repository is built on RMSIN and LAVT. We'd like to thank the authors for open sourcing their project.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bert		bert
data		data
lib		lib
loss		loss
refer		refer
README.md		README.md
args.py		args.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FIANet

Setting Up

Preliminaries

Package Dependencies

The Initialization Weights for Training

Datasets

Training

Testing

Citation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

Shaosifan/FIANet

Folders and files

Latest commit

History

Repository files navigation

FIANet

Setting Up

Preliminaries

Package Dependencies

The Initialization Weights for Training

Datasets

Training

Testing

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages