Inferact: Building the Infrastructure That Runs Modern AI

44 min

•Jan 22, 20266 months ago

Summary

Simon Mo and Wooseok Kwon, co-founders of Inferact and creators of the open source inference engine VLLM, discuss the evolution of AI inference from a simple optimization problem to one of the most complex challenges in modern computing. They explain how VLLM has grown from a PhD side project to power major deployments like Amazon's Rufus assistant, and outline their vision for building a universal inference layer through open source collaboration.

Insights

AI inference has become exponentially more complex due to three key factors: scale (trillion+ parameter models), diversity (varied model architectures and hardware), and agents (stateful, multi-turn interactions)
Open source inference infrastructure creates network effects where model providers, hardware vendors, and application developers all contribute for different strategic reasons, accelerating innovation beyond what any single company could achieve
The shift from single-turn AI tools to persistent agents fundamentally changes memory management and caching patterns, as systems can no longer predict when conversations end or when requests will return
Successful open source projects require treating infrastructure as a horizontal abstraction layer rather than vertical integration, similar to how operating systems abstract CPUs and databases abstract storage
The inference layer is becoming as critical as the models themselves, with the potential to become a foundational abstraction for accelerated computing similar to operating systems for CPUs

Trends

Multi-trillion parameter open source models expected in 2024Diverging model architectures moving away from standardization (sparse attention, linear attention)Agent-based AI systems requiring persistent state managementHorizontal specialization in AI infrastructure stackOpen source AI inference gaining enterprise adoption at scaleGPU diversity requiring specialized optimization per chip architectureReal-time inference becoming standard expectationCommunity-driven development outpacing proprietary solutionsVertical stack integration for AI workloadsAcademic research directly influencing production systems

Topics

AI Inference Optimization Open Source AI Infrastructure GPU Memory Management Distributed Systems Architecture Large Language Model Serving PageAttention Algorithm Dynamic Batching and Scheduling Multi-node Model Sharding KV Cache Management Agent-based AI Systems Hardware Abstraction Layers Community-driven Development Enterprise AI Deployment Accelerated Computing Model Architecture Diversity

Companies

People

Quotes

"Our goal is to make VLM the world's inference engine, really push the capabilities on the open source front and then build a universal inference layer."

Simon Mo

"What if the hardest problem in artificial intelligence isn't training smarter models, but simply keeping them running?"

Matt Bornstein

"I fundamentally believe that open source, especially how VLM itself is structured, is critical to the AI infrastructure in the world."

Simon Mo

"We're looking at 400k to 500k GPUs 24/7 running VLM and there's quite a big scale thinking about the global deployment of GPU footprints and we definitely believe there's a lot more out there."

Simon Mo

"This is the thing we heard over and over again that people just tell us we just cannot keep up with vlm. So that's why we're using vlm."

Simon Mo

Full Transcript

4 Speakers

Speaker A

Our goal is to make VLM the world's inference engine, really push the capabilities on the open source front and then build a universal inference layer. That means we'll have the runtime to power any new model on new hardware for new application, be able to tailor that to extreme efficiency, and support all the AI workload going forward. I fundamentally believe that open source, especially how VLM itself is structured, is critical to the AI infrastructure in the world. And what we want to do with Infrac is to support, maintain, steward and push forward the open source ecosystem. It is only that vlm when VLM becomes a standard and VLM help everybody to achieve what they need to do, then our company in a sense have the right meaning and to be able to support everybody around it.