"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Intelligence with Everyone: RL @ MiniMax, with Olive Song, from AIE NYC & Inference by Turing Post

55 min

•Feb 22, 20262 months ago

Summary

This episode features Olive Song, a senior researcher at Chinese AI company MiniMax, discussing their M2 series of open-weight models specialized for coding and agentic tasks. The conversation covers MiniMax's unique approach of developing both foundation models and applications in-house, their reinforcement learning techniques including interleaved thinking patterns, and the technical challenges of training frontier LLMs with limited resources compared to American AI labs.

Insights

Having developers and researchers work side-by-side creates tight feedback loops that enable rapid identification and fixing of model weaknesses during training
Interleaved thinking (allowing models to act, get feedback, think again, then continue) significantly improves performance on long-horizon agentic tasks compared to single-pass reasoning
Small technical decisions like maintaining FP32 precision during reinforcement learning can be critical for achieving theoretical algorithm performance
Systematic perturbation of training environments across all operational dimensions (tools, prompts, scaffolds) is essential for robust agent generalization
Open-weight models currently struggle most with adapting to different environments and achieving the same level of environmental understanding as top closed models

Trends

Shift toward specialized models for specific domains (coding, agents) rather than general-purpose modelsIncreasing importance of reinforcement learning and human alignment in model trainingGrowing emphasis on multi-agent systems and cost-effective model deploymentRise of interleaved reasoning patterns for complex task executionIntegration of AI agents for internal research and development workflowsFocus on robust generalization across diverse operational environmentsEmphasis on systematic perturbation pipelines for training robustnessGrowing importance of engineering discipline in scaling AI systemsTrend toward open-weight model releases from Chinese AI companiesIncreasing collaboration between open-source communities and commercial AI labs

Topics

Reinforcement Learning for Language Models Interleaved Thinking Patterns Model Alignment and Safety Open-Weight vs Closed AI Models Agentic AI Systems Multi-Agent Scalability Coding AI Assistants Model Evaluation and Benchmarking AI Training Infrastructure Perturbation Pipelines Long-Horizon Task Planning Tool Use and Environment Interaction Cross-Functional AI Development Teams AI Model Specialization Reward Hacking Prevention

Companies

MiniMax

Chinese AI company developing M2 series open-weight models for coding and agentic tasks

OpenAI

Referenced as comparison point for top American AI model performance

Anthropic

Claude models mentioned as benchmark for environmental adaptation capabilities

Open Router

Platform where MiniMax M2.5 currently tops the usage leaderboard

People

Olive Song

Senior researcher at MiniMax specializing in reinforcement learning and model evaluation

Ksenia Yase

Host of Inference by Turing Post podcast who interviewed Olive Song

Swix

Creator of the AI Engineer event series where Olive Song presented

Quotes

"During reinforcement learning, the model tries its best to hack a lot of things."

Olive Song

"Engineering is very, very, very important. I didn't know that during school."

Olive Song

"We can only know the definition of AGI when we achieve it."

Olive Song

"Problem solving is more of discovery."

Olive Song

"Intelligence with everyone is more like how it changes my life and it enables me to do more work and then how it can connect me better to different people."

Olive Song

Full Transcript

6 Speakers

Speaker A

Hello and welcome back to the Cognitive Revolution. The presenting sponsor of today's episode is Granola, the AI notepad that helps you get the doing done. Whether it's identifying to do items after a call, turning a brainstorming session into a product spec, or looking back at multiple calls to identify cultural trends at your company, Granola takes your raw meeting notes and makes them awesome. Right now, Granola is featuring AI recipes from AI thought leaders, including several past guests of this show. My own contribution is a blind spot finder recipe that looks back at recent conversations and attempts to identify things that I am totally missing. This was immediately useful in the context of contingency planning for my son's cancer treatment, and the more data Granola collects as I continue to use it, the more valuable it becomes for suggesting AI topic areas that I really ought to explore. See the link in our show notes to try my blind spot finder recipe and experience for yourself how Granola puts your meetings to work. Now today I'm excited to share a special combined crossover episode featuring Olive Song, a senior researcher specializing in reinforcement learning and model evaluation at the Chinese AI company Minimax, creators of the M series of models, the most recent of which M2.5 currently tops the Open Router Usage Leaderboard. To give you the most complete picture possible, we're combining two sources. First, a presentation Olive recently gave at the AI Engineer Conference in New York where she had previously lived for six years, and second, an interview with Ksen Yase from her podcast Inference by Turing post. Together they provide an excellent overview of Minimax's goals as a company, the capabilities they're prioritizing in their models, the techniques they're using to get there, and the day to day ups and downs of training Frontier LLMs Highlights include how Minimax's strategy of building both models and user facing applications in house creates tight feedback loops that enable their cross functional research and engineering teams to identify and address model weaknesses as quickly as possible. An overview of how interleaved thinking, which allows the model to take an action, get feedback from the environment and pause to think again before continuing, improves performance on long horizon agentic tasks. A description of the perturbation pipeline they use to systematically vary the model's training environment in order to encourage robust generalization. Olive's perspective on the constant battle she and teammates are fighting against reward, hacking a window into the tedious debugging that is sometimes required to diagnose training issues and how they realized that they needed to run reinforcement learning at FP32 Precision and finally, how the team at Minimax is using AI agents to to keep up with the daily flood of AI news. While Olive recognizes that Minimax's models, like all open source models in the world today, can't quite match the performance of top American models, I think there is still a lot of value in the details she shares about their approach to reinforcement learning and how they structure their team and work. And in any case, I always appreciate the opportunity to hear directly from Chinese AI researchers who, just like their American counterparts, are figuring things out step by step as they go, even as major questions about issues such as the governance of increasingly powerful open source models remain fundamentally unanswered. With that, I want to thank Swix, the creator of the AI Engineer event series, which I absolutely recommend attending if you can, and Ksenia, the creator of Turing Post, which has what I find to be some of the very best topic selection of any AI newsletter for allowing me to create and post this combined episode. And I hope you enjoy this window into the development of some of the best open weight models in the world with Olive Song of Minimax.