CMSL: Cross-modal Style Learning for Few-shot Image Generation
-
Graphical Abstract
-
Abstract
Training generative adversarial networks is data-demanding, which limits the development of these models on target domains with inadequate training data. Recently, researchers have leveraged generative models pretrained on sufficient data and fine-tuned them using small training samples, thus reducing data requirements. However, due to the lack of explicit focus on target styles and disproportionately concentrating on generative consistency, these methods do not perform well in diversity preservation which represents the adaptation ability for few-shot generative models. To mitigate the diversity degradation, we propose a framework with two key strategies: 1) To obtain more diverse styles from limited training data effectively, we propose a cross-modal module that explicitly obtains the target styles with a style prototype space and text-guided style instructions. 2) To inherit the generation capability from the pretrained model, we aim to constrain the similarity between the generated and source images with a structural discrepancy alignment module by maintaining the structure correlation in multiscale areas. We demonstrate the effectiveness of our method, which outperforms state-of-the-art methods in mitigating diversity degradation through extensive experiments and analyses.
-
-