UniBioTransfer: A Unified Framework for Multiple Biometrics Transfer

Caiyi Sun1*, Yujing Sun2*, Xiangyu Li2, Yuhang Zheng2, Yiming Ren2,3, Jiamin Wang3, Yuexin Ma3, Siu-Ming Yiu1†
1The University of Hong Kong · 2Digital Trust Centre, Nanyang Technological University · 3ShanghaiTech University
*Indicates Equal Contribution · Corresponding Author
UniBioTransfer teaser figure

The first unified framework capable of handling all four complicated and typical high-level and mid-level deepface generation tasks within the same model, while also generalizing efficiently to novel low-level and cross-level/intra-level compositional transfer tasks via minimal fine-tuning.
(Blue border: target image, orange border: reference images, green borders: transferred results)

Abstract

Deepface generation has traditionally followed a task-driven paradigm, where distinct tasks (e.g., face transfer and hair transfer) are addressed by task-specific models. Nevertheless, this single-task setting severely limits model generalization and scalability. A unified model capable of solving multiple deepface generation tasks in a single pass represents a promising and practical direction, yet remains challenging due to data scarcity and cross-task conflicts arising from heterogeneous attribute transformations. To this end, we propose UniBioTransfer, the first unified framework capable of handling both conventional deepface tasks (e.g., face transfer and face reenactment) and shape-varying transformations (e.g., hair transfer and head transfer). Besides, UniBioTransfer naturally generalizes to unseen tasks, like lip, eye, and glasses transfer, with minimal fine-tuning. Generally, UniBioTransfer addresses data insufficiency in multi-task generation through a unified data construction strategy, including a swapping-based corruption mechanism designed for spatially dynamic attributes like hair. It further mitigates cross-task interference via an innovative BioMoE, a mixture-of-experts based model coupled with a novel two-stage training strategy that effectively disentangles task-specific knowledge. Extensive experiments demonstrate the effectiveness, generalization, and scalability of UniBioTransfer, outperforming both existing unified models and task-specific methods across a wide range of deepface generation tasks. Our code will be released soon.

Visual Results

Visual comparisons on diverse deepface tasks

Problem Definition

We formulate various deepface tasks as swapping a set of attributes X (e.g., face identity, hair, pose, expression, skin tone) from a reference image Iref onto a target image Itgt, while preserving the remaining attributes Y from Itgt. The desired output is Iout = Xref ∪ Ytgt.

Method

Limitations of traditional mask-based strategy

Limitations of traditional mask-based strategy for attributes with significant structural changes (e.g., hair transfer). Masking exposes ground-truth geometry (a-top), so models trained on such pairs only learn to inpaint the masked region at inference (a-bottom), instead of performing true shape transfer. Our swapping-based strategy removes silhouette information in the target (b-top), forcing the network to transfer shape from the reference at inference (b-bottom).

Unified data corruption strategy

Our unified data corruption strategy for different attribute types. (a) Relative-static attributes: the target is constructed by simple masking or data augmentation of the GT image. (b) Spatially-dynamic attributes: we utilize our swapping-based corruption strategy, which employs an off-the-shelf generative model to replace specific attributes in the GT with arbitrary novel variations, preventing shape leakage from mask boundaries.

UniBioTransfer architecture overview

UniBioTransfer architecture overview. (a) Overall framework. (b) We introduce an MoE-enhanced Feed Forward Network (FFN). (c) Expert selection is guided by a Structure-Aware Router. (d) The entire system is optimized using a two-stage training strategy designed to stabilize routing and promote expert specialization.

Structure-aware routing scores

Structure-aware routing scores after softmax and before top-K selection, visualizing how experts specialize to semantically structured regions.

Additional Visual Results

All inputs are from the FFHQ dataset.


Complex Scenes (Extreme Poses, Expressions, Occlusions)

We manually select occlusion images from FFHQ dataset, exaggerated expressions from AffectNet dataset, and extreme poses from EFHQ dataset.
In each case, either the target or the reference image features a complex scene, while the other is a normal image from the FFHQ test set.

BibTeX

PLACE_holder