TabSyM: A Generative Pipeline for Small Multi-Cohort Omics Tabular Data

Nengneng Yu
Nengneng Yu
,
Yuefan Wang
,
Lindsey Kathleen Olsen
,
Bing Zhang
,
Hui Zhang
,
Zaoxing Liu
Abstract
TabSyM is an end-to-end modular pipeline that combines tabular diffusion (TabDDPM), task-aware sample selection, and multi-domain adversarial alignment (MDAN) to address small-sample, high-dimensional omics data and cross-cohort batch effects. TabSyM improves AUROC by 30.2% on gastric-cancer 3-year survival prediction across five cohorts, and achieves up to +22.1% AUROC / +21.8% F1 on pancreatic-cancer staging compared to state-of-the-art baselines.
Type
Publication
bioRxiv preprint
publications