Batch size optimization and latency-cost tradeoffs
Discussed in 1 analyzed podcast episode across 1 show
Discussed On
Episodes
Dwarkesh Podcast · Apr 29, 2026
Reiner Pope – The math behind how LLMs are trained and served
Roofline analysis for transformer inferenceMemory bandwidth as binding constraint on inferenceMixture of experts parallelism and communication patternsExpert parallelism vs tensor parallelism vs pipeline parallelismKV cache memory requirements and amortization
View Analysis