Free Systems trains Claude directly
In this week's System Check: Anthropic adopts the Dictatorship Eval, I develop a new research agenda on AI and the concentration of power, and more.
Anthropic released Opus 4.8 this week, and the System Card contains an exciting Free Systems easter egg: they are now directly using our Dictatorship eval to train and evaluate their models! They call it “Undermining liberal democracy” (arguably a clearer and more sober name), and their results show that 4.8 has continued to improve on the eval. And, tantalizingly, it looks like Mythos performs even better.
It’s super validating to see Free Systems research directly impacting the labs—this is precisely our theory of change. And I see two important paths forward from here:
Extend the eval, and get all the labs to use it. We need more scenarios, more ways of masking the authoritarian requests, and we need to keep them secret so that the models haven’t been trained on them before they’re evaluated. And we need Google, OpenAI, and others to adopt it like Anthropic has.
Design constitutions for AI. The dictatorship eval is useful, but as I’ve discussed, it’s fundamentally limited: the CEO of a frontier lab, or a powerful government official, will always be able to get around any guardrails a model puts in place. To truly understand how AI might fuel authoritarianism, we need to think about constitutions for AI that restrain the behavior of powerful actors like those. This requires going outside the model itself.
A research agenda for AI and the concentration of power
One of the most burning questions about AI and its impact on the world is whether it will concentrate power—economic, political, and otherwise. I’m planning out some new research on this, and I had a really valuable set of conversations this past week. Here’s what I learned and what I’ll be working on:
Tweet of the week
I thought this back and forth was amazing and hilarious.
Question of the week
What do you want Free Systems to do that it’s not doing currently?




