Memory bandwidth as binding constraint on inference
Discussed in 1 analyzed podcast episode across 1 show
Discussed On
Episodes
Dwarkesh Podcast · Apr 29, 2026
Reiner Pope – The math behind how LLMs are trained and served
Roofline analysis for transformer inferenceBatch size optimization and latency-cost tradeoffsMixture of experts parallelism and communication patternsExpert parallelism vs tensor parallelism vs pipeline parallelismKV cache memory requirements and amortization
View Analysis