The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762

79 min

•Feb 26, 20265 months ago

Summary

Sebastian Raschka discusses the evolution of LLMs in 2025-2026, focusing on three key areas: reasoning capabilities through post-training techniques, inference scaling methods, and agentic applications. The conversation covers recent model releases, practical implementation strategies, and predictions for continued innovation in the reasoning and tool-use paradigms.

Insights

Most LLM R&D focus has shifted from pre-training to post-training optimization, particularly reasoning capabilities, as pre-training is already sophisticated but post-training still has low-hanging fruit
Verifiable rewards in math and coding enable infinite answer generation for training, providing more reliable feedback than human evaluation and driving major reasoning improvements
The biggest practical LLM benefits come from creating custom workflow automation tools rather than using sophisticated agentic wrappers or interfaces
Current reasoning models excel at automatically determining appropriate effort levels, reducing the need for users to manually specify high-effort modes for most tasks
Multi-agent systems face compounding failure rates as more models are chained together, making single-model optimization more impactful than complex agent architectures

Trends

Shift from pre-training innovation to post-training and reasoning optimizationIncreased focus on inference scaling techniques like self-consistency and self-refinementGrowing adoption of verifiable rewards beyond math and coding domainsRise of hybrid attention mechanisms and sparse attention for efficiencyMovement toward automatic effort/reasoning level selection in LLM interfacesIntegration of tool use as standard LLM capability rather than add-on featureDevelopment of specialized fine-tuned models for specific applications like codingEmergence of mixture-of-experts architectures in flagship modelsGrowing emphasis on deterministic tool creation over pure LLM automationEvolution toward context-aware routing between different model capabilities

Topics

Reasoning LLMs and post-training techniques Inference scaling and self-consistency methods Verifiable rewards for math and coding Agentic LLM applications and multi-agent systems Tool use integration and workflow automation LLM architecture evolution and mixture-of-experts Custom application development with LLMs Process reward models and self-refinement Long context capabilities and RAG alternatives Continual learning challenges and approaches Sparse attention and efficiency improvements Code generation and development workflows Model routing and automatic effort selection Open-weight vs proprietary model ecosystems Text diffusion models as transformer alternatives

Companies

OpenAI

Discussed for ChatGPT, O1 reasoning models, and Codex 5.3 release driving reasoning revolution

DeepSeek

Featured prominently for R1 reasoning model and V3 architecture influencing industry direction

Anthropic

Mentioned for Claude models and Opus 4.6 release, plus coding capabilities

Google

Referenced for Gemini models and upcoming text diffusion model development

Alibaba

Discussed for Qwen 3 models and hybrid architecture experiments with state space models

Mistral AI

Mentioned as European company adopting DeepSeek V3 architecture for their models

Nvidia

Referenced for GRPO algorithm contributions to reasoning model training

Microsoft

Mentioned in context of Visual Studio Code integration with LLM coding tools

People

Sebastian Raschka

Independent LLM researcher and guest discussing reasoning models and writing technical books

Sam Charrington

Host of The TWIML AI Podcast conducting the interview on LLM trends

Quotes

"The R and D like the research and development of the focus of the research team. I think it's more focused nowadays on the post training, like getting more performance out of that because it's more like the newer paradigm and there are still low hanging fruits to be picked where in pre training it's already pretty sophisticated."

Sebastian Raschka•Beginning

"My hypothesis is if you would take the best open weight LLM and put it into, let's say a ChatGPT or Gemini or Claude interface, you would almost get the same type of quality performance and everything. Where I think a lot of use cases evolve around the tool wrapper, around the LLM nowadays."

Sebastian Raschka•Early discussion

"It's almost like wasteful to also even ask an LLM what is one plus one or something like that. You can use a calculator. So it's like I think it's still important kind of to recognize what is the nature of this problem and what is the best tool for that problem."

Sebastian Raschka•Mid-discussion

"The more models you add, the higher the risk that one of them fails if they depend on each other. And I think improving the model itself here will also help improving the whole system basically as the main way to improve the performance."

Sebastian Raschka•Discussion on multi-agent systems

Full Transcript

2 Speakers

Speaker A

The R and D like the research and development of the focus of the research team. I think it's more focused nowadays on the post training, like getting more performance out of that because it's more like the newer paradigm and there are still low hanging fruits to be picked where in pre training it's already pretty sophisticated. You will still get better results if you use more data, optimize the data mix, maybe multi token prediction and these types of things. But most of the interesting things are happening now on the post training front in the reasoning realm. So I think we will see more there.