Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model

Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li,
Xu Tang, Yao Hu, Haibo Zhao


Shanghai Jiao Tong University, Xiaohongshu Inc., Peking University, Shenyang Institute of Automation Chinese Academy of Sciences, National University of Singapore


Our proposed framework, Stable-Makeup, is a novel diffusion-based method for makeup transfer that can robustly transfer a diverse range of real-world makeup styles, from light to extremely heavy makeup.

[Paper]      [Code]     

Abstract

Current makeup transfer methods are limited to simple makeup styles, making them difficult to apply in real-world scenarios. In this paper, we introduce Stable-Makeup, a novel diffusion-based makeup transfer method capable of robustly transferring a wide range of real-world makeup, onto user-provided faces. Stable-Makeup is based on a pre-trained diffusion model and utilizes a Detail-Preserving (D-P) makeup encoder to encode makeup details. It also employs content and structural control modules to preserve the content and structural information of the source image. With the aid of our newly added makeup cross-attention layers in U-Net, we can accurately transfer the detailed makeup to the corresponding position in the source image. After content-structure decoupling training, Stable-Makeup can maintain content and the facial structure of the source image. Moreover, our method has demonstrated strong robustness and generalizability, making it applicable to various tasks such as cross-domain makeup transfer, makeup-guided text-to-image generation and so on. Extensive experiments have demonstrated that our approach delivers state-of-the-art (SOTA) results among existing makeup transfer methods and exhibits a highly promising with broad potential applications in various related fields.

Background

As a significant computer vision task, makeup transfer has a wide range of applications, such as in the beauty industry or in virtual try-on systems, and enables the enhancement, modification, and transformation of facial features, achieving the desired effects of beautification, embellishment, and deformation. However, despite its conceptual straightforwardness, makeup transfer poses significant challenges when aiming for a seamless and authentic transformation across a wide range of makeup intensities and styles. Current approaches fall short when confronted with the diversity of real-world makeup styles, especially when translating high-detailed and creative cosmetics, such as those found in cosplay or movie character imitations, onto real faces. This limitation not only restricts their applicability but also hinders their effectiveness in accurately capturing the essence of personalized and intricate makeup designs. Recognizing this gap, our research introduces Stable-Makeup, a novel approach leveraging diffusion-based methodologies to transcend the boundaries of existing makeup transfer methods.


Approach

Our Stable-Makeup, which is built on a pre-trained diffusion model, and comprises three key components: the Detail-Preserving Makeup Encoder, Makeup Cross-attention Layers and the Content and Structural Control Modules. To preserve makeup details, we employ a multi-layer strategy in our D-P makeup encoder to encode the makeup reference image into multi-scale detail makeup embeddings. The content control module is designed to maintain pixel-level content consistency with source image. The structural control module is utilized to introduce facial structure, improving the consistency between the generated image and the facial structure of the source image. To achieve semantic alignment between the intermediate features of the Unet-encoded source image and the detail makeup embeddings, we extended the U-Net architecture by incorporating a makeup branch composed of cross-attention layers. Through our proposed content and structural decoupling training strategy, we can further maintain the facial structure of the source image.




Visual Comparison

Qualitative comparison of different methods. Our results outperform other methods in terms of makeup detail transfer, ranging from light makeup to personalized heavy makeup.

More Applications

Our Stable-Makeup not only achieves unprecedented makeup transfer effects on real faces, but also enables various applications that were previously unattainable with traditional makeup transfer methods. For instance, our method can perform makeup transfer on cross-domain faces and guide text-to-image generation with reference makeup conditions to create creative artistic images. Our method can also perform makeup transfer on videos by utilizing effective multi-frame concatenation. Moreover, our approach supports other domain's reference makeup, such as animal-inspired subjects, animated characters and so on. providing a broader range of creative makeup options for real human faces.



Ablation Study

Qualitative ablation results. Figure (a) explores the classifier-free guidance scale. Figure (b) presents the ablation study of different training and inference settings.



More Results

We also present a range of additional results that demonstrate the robustness and superiority of our approach.



BibTex

@misc{zhang2024stablemakeup,
  title={Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model},
  author={Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, Haibo Zhao},
  year={2024},
  eprint={2403.07764},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}