Article

Enabling Performant and Flexible Model-Internal Observability for LLM Inference

A performant, flexible deep model inspector that turns internal LLM observability into a first-class systems primitive — only 0.4%–6.8% offline / ~6% online overhead, 2×–15× lower …

Nengneng Yu

• May 7, 2026 • 1 min read

Systems for ML

Reliable and Resilient Collective Communication Library for LLM Training and Serving

A fault-tolerant, NCCL-compatible collective communication library that keeps LLM training and serving alive under NIC/link failures with <1.1% training / <3% inference overhead.

wei-wang

• Dec 1, 2025 • 1 min read

ML Algorithms

TabSyM: A Generative Pipeline for Small Multi-Cohort Omics Tabular Data

Diffusion-based generative pipeline for small, high-dimensional, cross-cohort omics tabular data.

Nengneng Yu

• Jul 14, 2025 • 1 min read