Multi-tier memory hierarchies (HBM, DDR, Flash)
Discussed in 1 analyzed podcast episode across 1 show
Discussed On
Episodes
Dwarkesh Podcast · Apr 29, 2026
Reiner Pope – The math behind how LLMs are trained and served
Roofline analysis for transformer inferenceBatch size optimization and latency-cost tradeoffsMemory bandwidth as binding constraint on inferenceMixture of experts parallelism and communication patternsExpert parallelism vs tensor parallelism vs pipeline parallelism
View Analysis