Powering the AI Inference Wave with EPRI's Ben Sooter - Ep. 292

32 min

•Mar 4, 20263 months ago

Summary

Ben Suter from EPRI discusses how microdata centers can address the coming AI inference wave by leveraging underutilized electrical substations. The conversation explores the shift from centralized training data centers to distributed inference infrastructure, examining energy grid implications and the potential for 80% of AI compute to occur during inference rather than training.

Insights

AI inference will consume 80% of a model's lifetime compute capacity compared to 20% for training, creating a massive distributed compute demand
Microdata centers positioned near underutilized electrical substations can provide 3-25 megawatts of capacity while leveraging existing grid infrastructure
Distributed inference networks offer load balancing opportunities and faster deployment by avoiding transmission interconnection queues
Agentic AI systems running continuously may fundamentally change expected load patterns from human-driven usage to 24/7 operation
Geographic distribution of inference capacity near users reduces latency while creating new opportunities for grid flexibility and energy storage integration

Trends

Shift from centralized AI training facilities to distributed inference infrastructureGrowing demand for edge computing capabilities to reduce latency for real-time AI applicationsIntegration of AI data centers with existing electrical grid infrastructure rather than building new capacityRise of agentic AI systems creating continuous compute loads rather than human-driven patternsConvergence of energy storage, renewable energy, and AI infrastructure for grid flexibilityEmergence of 20-megawatt microdata centers as optimal size for distributed inferenceUtility companies proactively planning for AI compute loads to avoid grid strainReal-time AI applications driving need for geographically distributed compute resources

Topics

Microdata centers AI inference infrastructure Electrical grid capacity planning Distributed computing architecture Energy consumption patterns Substation utilization Load balancing and grid flexibility Agentic AI systems Real-time AI applications Energy storage integration Renewable energy pairing Utility infrastructure optimization AI training vs inference compute requirements Latency optimization Grid interconnection challenges

Companies

NVIDIA

Technology partner helping determine compute needs and developing inference-specific chips

Electric Power Research Institute

Ben Suter's employer, leading research on microdata center integration with electrical grids

OpenAI

Referenced for ChatGPT usage patterns and recent Claude agent developments

Netflix

Used as analogy for content distribution evolution from centralized to geographically dispersed

Groq

Mentioned as example of AI inference service generating compute demand

Google

Referenced for Gemini AI service as example of inference workload

People

Ben Suter

Director of R&D at EPRI with 20+ years experience in energy technology research

Noah Kravitz

Host of the NVIDIA AI Podcast conducting the interview

Quotes

"Only about 20% of its lifetime compute capacity and thus its power consumption is in the training side. 80% of it is in the inference side."

Ben Suter

"If we can get extra usage out of existing assets, then that's sort of a win for everyone."

Ben Suter

"The training loads would slam hundreds of megawatts of demand nearly instantly within milliseconds."

Ben Suter

"Now all of a sudden I'm like, well, that completely changes the paradigm because now it's running at night while I'm sleeping."

Ben Suter

Full Transcript

3 Speakers

Speaker A

Foreign.