[LIVE] Anthropic Distillation & How Models Cheat (SWE-Bench Dead) | Nathan Lambert & Sebastian Raschka

52 min

•Feb 26, 20263 months ago

Summary

The episode discusses Anthropic's recent blog post about detecting distillation attacks from Chinese AI labs using their APIs to train competing models. The hosts also examine the failure of SWE-bench Verified as a coding benchmark, revealing widespread cheating and unsolvable problems that led OpenAI to deprecate it.

Insights

Distillation detection relies heavily on usage patterns and scale rather than content analysis, making it difficult to distinguish from legitimate evaluation
Even heavily scrutinized benchmarks like SWE-bench Verified can have fundamental flaws, with 59% of problems being unsolvable
Chinese labs' API usage for distillation is economically rational given GPU shortages, despite terms of service violations
Model memorization from single-pass training remains poorly understood despite being crucial for benchmark integrity
The strongest models aren't necessarily the best teachers for distillation due to probability matching requirements

Trends

Increasing geopolitical tensions around AI model training and data accessBenchmark saturation and gaming becoming more sophisticatedPrivate evaluation datasets becoming necessary to prevent contaminationAPI-based distillation replacing traditional synthetic data generationEvaluation costs scaling to millions of dollars for frontier modelsShift from public to private benchmark splits to prevent cheatingGrowing importance of agentic benchmarks over completion tasks

Topics

Model distillation techniques AI benchmark integrity Chinese AI lab competition API terms of service enforcement SWE-bench evaluation problems Model memorization capabilities Coding benchmark evolution Geopolitical AI competition Evaluation infrastructure costs Teacher-student model dynamics Benchmark contamination detection Private vs public evaluation datasets AI model cheating methods Frontier model evaluation Open source benchmark limitations

Companies

Anthropic

Published blog post about detecting distillation attacks from Chinese labs using their Claude API

OpenAI

Created SWE-bench Verified and recently deprecated it due to fundamental problems with the benchmark

DeepSeek

Chinese AI lab mentioned as using Anthropic's API for distillation, though at lower volumes

Minimax

Chinese AI lab that used millions of API calls from Anthropic for model distillation

Scale AI

Created SWE-bench Pro as a replacement for the flawed SWE-bench Verified benchmark

OpenRouter

API routing service mentioned as useful for model comparisons and distillation work

ByteDance

Previously had API access cut off by OpenAI, representing early enforcement of distillation policies

XAI

Blocked by Anthropic from using their models, possibly for distillation activities

Google

Mentioned for using technical distillation with logits for their Gemma models

Cognition

Working with the host on launching a new coding benchmark to replace SWE-bench

People

Nathan Lambert

Co-host discussing AI distillation and benchmark issues, joining the Sail coalition

Sebastian Raschka

Co-host providing technical insights on distillation methods and model training

Swix

Latest writer joining the Sail coalition, contributing AI media content

Jeff Dean

Google executive interviewed about Gemini model architecture and distillation practices

Quotes

"I'm of the opinion that the Chinese labs, like obviously should do this. They're in a massive GPU shortage and using APIs is way easier than generating synthetic data on their own."

Nathan Lambert

"The strongest model is not necessarily the best teacher and most of us in this area think it's due to like some you have to match the probabilities of the tokens to the base model."

Nathan Lambert

"59% of them cannot even be solved at all because the original benchmark was still slop stuff got through that was not solvable."

Swix

"If this happens to Sweepbench Verified, which I think is the most scrutinized benchmark in the world... literally 80 point between 1 and 9, let's say where there's almost zero variation."

Sebastian Raschka

Full Transcript

3 Speakers

Speaker A

Okay, we're live. We have one person. People will start trickling in. Thanks for coming to Sail Live Number six. This is a very exciting one. I think we have a. I mean, the topics are always fun with these as whatever is the topic of the day on our little rat racing minds trying to keep up with AI. But we're welcoming the latest writer that is joining the Sail coalition. So I think this just means more content for sale. I think I've been a fan of Swix and a friend for a while at this point, so I'm very happy to have his content join this. And I think you've been doing great stuff recently and continuing to evolve this.