Skip to content

This is a demo for our paper 'SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation'

Notifications You must be signed in to change notification settings

XinleiNIU/demo-SoundLoCD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

About

This is a demo for our paper 'SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation_'. The SoundLoCD is a LoRA-based conditional discrete contrastive latent diffusion model for text-to-sound effects generation. Unlike recent large-scale audio generation models, our SoundLoCD can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy enhances the connection between textual conditions and the generated audio outputs, resulting in coherent performance.

Citation

If you are interesting in our work, please cite it as below:

@INPROCEEDINGS{niu2023soundlocd,
  author={Niu, Xinlei, Jing Zhang, Christian Walder, and Charles Patrick Martin}
  title={SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation}, 
  year={2023}
}

About

This is a demo for our paper 'SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published