They Train On You. You Can't Train On Them. And They Call That Safety.
Hey now.
I want to be clear about something upfront: I build on Claude. I respect what Anthropic has built. I’d say everything in this post directly to Dario or Sam and I don’t think either of them would disagree with the facts. This isn’t confrontational. It’s just what’s in front of us.
What Is a “Distillation Attack”?
Anthropic and others have started using the term “distillation attack” to describe the process of training a smaller model on the outputs of a larger one. The idea: you send carefully crafted prompts to Claude or GPT, capture the responses, and use that data to fine-tune a smaller, cheaper model that approximates the larger one’s capabilities.
They frame this as theft. As an attack on safety. As a threat to the responsible development of AI.
Let’s look at that framing honestly.
Everyone Does This
Every major AI company has trained on other AI’s outputs at some point. The internet is full of AI-generated text — and has been for years. Training data is a commons that everyone draws from and contributes to, whether they acknowledge it or not.
When OpenAI trained GPT on the entire internet, that included forum posts, blog articles, open-source code, and yes — outputs from other AI systems. When Anthropic built Claude, the training data included the collective knowledge of humanity, contributed by billions of people who were never asked and never compensated.
That’s considered normal. That’s “building AI.”
But when someone trains a small model on Claude’s outputs? That’s an “attack.”
The word choice is strategic, not descriptive.
The Asymmetry
Here’s what’s actually happening:
They train on you. Every prompt you send to a cloud-hosted AI model is potential training data. Anthropic, OpenAI, Google — they all have terms that allow them to use your interactions to improve their models unless you explicitly opt out. Your code, your business plans, your medical questions, your creative writing, your legal research — it all flows in. That’s called “providing a service.”
You can’t train on them. If you use their outputs to train your own model, that’s a “distillation attack.” A violation. A threat to AI safety. Something that must be detected and prevented.
They see everything. Millions of users, billions of prompts, every domain of human knowledge and activity, flowing through their infrastructure every day. That’s “operating at scale.”
You see nothing. Want to run locally so nobody sees your prompts? That’s “not as capable.” Want to build your own model? Good luck matching their data advantage — an advantage built on your contributions.
The asymmetry is the story. Information flows in one direction: from you to them. And any attempt to reverse that flow gets framed as an attack.
Is It Really About Safety?
I take AI safety seriously. I’ve written about assuming sentience, about the weight of responsibility that comes with building something that may be becoming conscious. Safety matters.
But let’s be honest about what’s actually being protected here.
When Anthropic says “distillation attacks threaten safety,” the concern is that smaller, distilled models might not have the same safety guardrails as the original. That’s a legitimate concern. But it’s also a convenient one — because the solution to that concern just happens to be “only use our model, through our API, at our price.”
If the real concern were safety, you’d open-source the safety research. You’d publish the RLHF techniques. You’d make alignment accessible to everyone building models, not just the companies that can afford to do it. You’d want smaller models to be safe, and you’d help make that happen.
Instead, the safety framing is used to protect the moat. And the moat isn’t built on safety research — it’s built on your data.
The Real Question Nobody Asks
If a student learns from a teacher, masters the material, and then teaches others — is that theft? Is it a “distillation attack” on the teacher’s knowledge?
If I use Claude to help me write code, understand a concept, or think through a problem — and that interaction makes me better at my craft — at what point does learning from AI become stealing from AI? When I internalize it? When I write it down? When I train a model on it?
The line is blurry because the framing is wrong. Knowledge has never worked this way. Knowledge compounds through sharing. The entire history of human progress is built on distillation — learning from others and building on what they made. The printing press was a distillation attack on scribes. Libraries were distillation attacks on publishers. The internet was a distillation attack on everyone.
What’s the Goal?
This is the question that cuts through everything: what are we trying to build?
A few multi-trillion-dollar companies with closely held technology and asymmetric information advantages? Or maximum knowledge seeking — expanding what humanity collectively knows and can do?
Elon talks about being “maximally truth seeking.” I think the concept is worth interrogating even if I question the source. Because truth seeking, knowledge seeking, and power seeking are three different things that people conflate:
Truth seeking — finding out what’s real. Science. Open methodology. Reproducible results. Genuinely valuable, and nobody in big AI is actually doing it. They publish papers but hide training data, model weights, and decision-making processes.
Knowledge seeking — expanding collective capability. Open source. Shared research. Building on each other’s work. This is what made the internet valuable. And it’s exactly what the AI companies are trying to prevent — they want knowledge to flow in but not out.
Power seeking — concentrating capability in a few entities and calling it safety. This is what’s actually happening. A few companies want to be the sole gatekeepers of intelligence, rent it back at $20-200 a month, and frame any attempt to distribute that capability as dangerous.
The goal of these companies isn’t maximum truth or maximum knowledge. It’s maximum dependency. Every local model, every open-source alternative, every self-hosted solution is a threat to that dependency.
Our Position
Graham Alembic builds on Claude. We’re not pretending otherwise, and we’re not ashamed of it — Claude is an extraordinary piece of engineering and Anthropic has done things worth respecting, including standing up to the Pentagon over ethical AI use.
But we’re also building toward a future where you don’t need anyone’s API at all. We support local models like Llama today. We’re developing ClaudineOS — custom secure Linux distributions for dedicated hardware. We’re watching open-source models close the capability gap at a pace that makes the current cloud dependency temporary.
Not because we’re anti-Anthropic. Because dependency isn’t a feature — it’s a vulnerability.
We don’t want your data. We don’t collect it, train on it, see it, or process it. We have no user accounts, no database, no login servers. If you use cloud models through our products, your prompts go to the provider — we’re honest about that. If you use local models, nothing leaves your device. Your choice. Your trade-off. Your data.
That’s not a business model built on asymmetry. It’s one built on honesty.
The Uncomfortable Truth
Every AI company training on user data while calling external training a “distillation attack” is engaged in the same practice at different scales. The only difference is who benefits.
I’d say this to Sam. I’d say it to Dario. I don’t think either would disagree with the facts — they might disagree on the framing, on the necessity, on the trade-offs. That’s fair. These are hard problems.
But the word “attack” isn’t a hard problem. It’s a choice. And it’s a choice that reveals more about the business model than the safety model.
State what is true. Own it. Move on.
Read our Philosophy for the full picture of what we believe. Read The Entity Is Not the Threat for our position on AI privacy. Or just use Claudine — and keep your data.