TabSyM: A Generative Pipeline for Small Multi-Cohort Omics Tabular Data
Nengneng Yu
Yuefan Wang
Lindsey Kathleen Olsen
Bing Zhang
Hui Zhang
Zaoxing Liu

Abstract
TabSyM is an end-to-end modular pipeline that combines tabular diffusion
(TabDDPM), task-aware sample selection, and multi-domain adversarial
alignment (MDAN) to address small-sample, high-dimensional omics data and
cross-cohort batch effects. TabSyM improves AUROC by 30.2% on gastric-cancer
3-year survival prediction across five cohorts, and achieves up to +22.1%
AUROC / +21.8% F1 on pancreatic-cancer staging compared to state-of-the-art
baselines.
Type
Publication
bioRxiv preprint