Systems for ML

Enabling Performant and Flexible Model-Internal Observability for LLM Inference featured image

Enabling Performant and Flexible Model-Internal Observability for LLM Inference

A performant, flexible deep model inspector that turns internal LLM observability into a first-class systems primitive — only 0.4%–6.8% offline / ~6% online overhead, 2×–15× lower …

avatar
Nengneng Yu
Reliable and Resilient Collective Communication Library for LLM Training and Serving featured image

Reliable and Resilient Collective Communication Library for LLM Training and Serving

A fault-tolerant, NCCL-compatible collective communication library that keeps LLM training and serving alive under NIC/link failures with <1.1% training / <3% inference overhead.

wei-wang