Evals, Feedback Loops, and the Engineering That Makes AI Work

44 min

•Feb 17, 20264 months ago

Summary

Ankur Goyal, CEO of BrainTrust, discusses how successful AI companies focus on engineering around models rather than just using the smartest models. The conversation explores the tension between AI's continuous nature and systems' discrete approach, the economics of frontier vs. open source models, and why proper evaluation frameworks are crucial for AI product development.

Insights

Companies shipping successful AI products aren't using the smartest models - they're using the best engineering around models with proper evals and feedback loops
Chinese AI models show high token volume usage but low dollar-weighted spend, suggesting cost-driven adoption with quality trade-offs
The AI industry may hit demand-side limitations before supply-side constraints, as enterprises struggle to implement AI systems despite unlimited apparent demand
Frontier labs can raise capital faster than they're limited by engineering complexity, creating a unique dynamic where money directly translates to model capability
SQL significantly outperforms Bash for agent tasks across accuracy, efficiency, and speed metrics, challenging the popular 'give agents a Unix environment' approach

Trends

Shift from model intelligence competition to engineering excellence around modelsGrowing adoption of Chinese AI models for cost optimization in high-volume use casesEnterprise AI implementation lagging behind individual consumer adoptionOpen source models gaining ground during frontier model stagnation periodsMovement toward declarative type systems and formal specifications for AI applicationsToken-based pricing becoming the standard, forcing companies to align value with usageCyclical pattern between closed-source innovation and open-source catch-upIncreasing importance of evaluation frameworks as product management toolsDemand-side saturation potentially limiting AI adoption before supply constraintsEngineering discipline becoming differentiator as models commoditize

Topics

AI model evaluation frameworks Open source vs closed source AI models Chinese AI model adoption patterns Enterprise AI implementation challenges Token-based pricing strategies AI agent architecture design SQL vs Bash for AI agents AI model engineering vs brute force approaches Frontier lab funding dynamics AI systems reliability and consistency Type systems for AI applications Feedback loops in AI development AI model commoditization trends Capital allocation in AI development Consumer vs enterprise AI adoption rates

Companies

BrainTrust

Ankur Goyal's company providing AI evaluation and testing frameworks for enterprises

OpenAI

Referenced as frontier lab with superior API reliability and rate limit management

Anthropic

Mentioned as raising more money than entire downstream ecosystem combined

Figma

Acquired Ankur's previous company Impera; where he led AI team building on LLMs

Cursor

Example of company with good AI product engineering, compared to rewriting OS for new chipset

Replit

Mikayla from Replit quoted on wanting to throw away codebase when new models release

Datadog

Pricing model inspiration for BrainTrust, percentage of cloud spend approach

MemSQL

Ankur's previous company addressing SQL database limitations for web applications

Impera

Ankur's AI document extraction company from 10 years ago, acquired by Figma

ChatGPT

Dominant consumer AI application, part of potential two-horse race with Gemini

Google

Gemini mentioned as other major consumer AI application competing with ChatGPT

People

Ankur Goyal

Founder and CEO of BrainTrust, former database engineer and AI company founder

Martin Casado

A16z partner hosting the podcast interview with Ankur Goyal

Mikayla

Replit employee quoted on wanting to discard codebase when new AI models release

Ollie

Datadog executive and BrainTrust investor who advised on cloud spend pricing models

Richard Sutton

Author of the 'bitter lesson' paper advocating against engineering in favor of compute

Quotes

"AI is continuous and systems are discrete. Humans fundamentally think a little bit more in terms of systems and you know, predictability and reliability, consistency than they do non determinism."

Ankur Goyal

"Right now we are kind of building, you know, like God. And so it's possible and probably economically viable to keep throwing capital at the problem to make God 1% smarter."

Ankur Goyal

"If you're building an agent and you're using pre trained models or whatever, it's a fool's errand to think that the agent that you're building, the way that you're providing context to it, like all that stuff that is not engineering and it shouldn't be engineered and it should be bitter lesson pilled."

Ankur Goyal

"I think evals are like the scientific method applied to software engineering with, you know, non deterministic systems like AI systems."

Ankur Goyal

"In the past, large companies that received lots of money were basically naturally rate limited by engineering speed. These frontier labs don't have that problem. They can literally just raise money and build a model based on the money."

Ankur Goyal

Full Transcript

3 Speakers

Speaker A

AI is continuous and systems are discrete. Humans fundamentally think a little bit more in terms of systems and you know, predictability and reliability, consistency than they do non determinism.