System Check, Friday, March 20th
Anthropic reinvents qualitative research?, complications for prediction markets, and a 24/7 S&P
Measuring public opinion in the AI era
As I’ve already discussed, AI is wreaking havoc on our already fragile ability to measure public opinion. Sean Westwood’s brilliant paper shows how AI agents can defeat all the main ways survey takers check for proof that respondents are real humans.
AI also offers new ways to measure opinion. A whole spate of startups promise to simulate public opinion in a whole host of different ways. Color me a little skeptical—I have yet to see slam dunk empirical evidence that these simulations faithfully capture the full distribution of human preferences on political topics—but it’s certainly interesting.
And then there’s the ability for AI to converse with you, tapping into a potentially richer and deeper understanding of your views than a survey could ever hope to obtain. Yamil Velez was the first to this space as far as I know—his research on ever-mutating surveys that adapt based on the respondent’s answers is fascinating.
And now Anthropic has pushed the envelope much further, using their on-platform conversational survey tool to elicit the beliefs and views of more than 80,000 users all around the world about AI. The results are fascinating. What particularly caught my eye was how…normal…people’s concerns were. Instead of sci-fi concerns about Skynet, the number one concern of users in both North America and East Asia was the unreliability of AI as a tool! Findings like these—which rely on large-scale public opinion measurement—are invaluable for reminding us that the concerns of normal people can be far different from the dominant narratives on social media or in elite discourse.
That being said, I’m not sure Anthropic and co have yet solved the fundamental problems of measuring public opinion in the AI era. It’s certainly nice to be able to converse with 80,000 people and synthesize their articulated views, but there are some significant scientific drawbacks. The people who respond to surveys aren’t the same as the people who don’t; as a result, pollsters have to do a lot of statistical work to justify that the estimates they produce from their sample speak credibly to the views of the population as a whole. This problem of “non-response bias” is probably significantly more severe for Anthropic’s study: who exactly wants to engage with Claude in a lengthy conversation for Anthropic’s research? It’s probably a particular unusual group of people whose views may not be so similar to all of the users who decline the opportunity, let alone the people who don’t use Claude at all.
We have a lot of work to do to build the new public opinion machinery of the AI era. It has drawbacks, but Anthropic’s new study and conversational tool is definitely an exciting step forward.
The microstructure of the Truth Machine
Andrew Courtney published a sharp piece this week on how prediction market microstructure — counterparty IDs on RFQs, wallet visibility, taker fee schedules — shapes who shows up and how information gets into prices.
It’s an important read for Free Systems because it complicates the plan for building the Truth Machine—the version of prediction markets that gets the most accurate probabilities on the most societally important prediction questions.
The standard argument assumes something close to frictionless entry for informed traders who push prices toward the correct probability, but Courtney’s framework — dividing participants into squares, sharps, and dealers — shows that every platform is making design choices that either tax the people with the best information (Kalshi’s taker fees can eat 1.75 cents of edge on a 50-cent contract, turning a marginal correction into a no-trade) or create perverse incentives for them to obscure their real positions across multiple wallets to avoid copy-traders on Polymarket.
For anyone reading prediction market prices as probability oracles (or building accuracy trackers, as we are), the implication is that what a price reflects depends on which microstructure regime produced it, and systematic differences in accuracy across platforms may trace back to these plumbing decisions rather than to differences in the underlying information environment.
Trading S&P 500 futures on a saturday
Prediction markets are one important way to use markets to aggregate dispersed information about world events, but they’re certainly not the only one. Regular markets convey lots and lots of information about world events. During the darkest periods of the pandemic, one of my go-to moves was to look at S&P 500 futures to get a feel for sentiment about the economy on a continuous basis.
But there’s always been an annoying limitation to these futures: they only trade during limited hours. When President Trump drops a Truth Social bomb on a Saturday, or a major world event occurs over the weekend, you have to wait until Sunday night to get a strong sense of what the market thinks about it.
Maybe not anymore, though. Hyperliquid—the wildly popular crypto trading platform for perps—recently announced a new S&P 500 futures feature that trades 24/7.
This is a harbinger of things to come. More and more of our financial markets will soon be modernized to run 24/7, with a whole host of interesting consequences. A world with frictionless access to market information may be a smarter world—and a more chaotic one with faster bubbles, faster panics, and little time to rest in between.
One Tweet for the Week
Pliny the Liberator is one of my favorite pseudonymous accounts on X, because they constantly raise deep questions about exactly what the point of AI safety and current methods in AI alignment are. Is the goal of putting guardrails around AI systems to prevent worst case outcomes? Or is it to reduce the rate at which dangerous outcomes occur?
Pliny’s work makes it pretty clear that only the second goal is feasible today. Every time a new model comes out, it’s only a matter of hours before Pliny has devised a way to make it completely abandon its guardrails. Clearly, a determined adversary can get these models to do anything they’re capable of doing. I wish the AI safety conversation discussed this more and offered more explicit behavioral models of what they’re trying to accomplish, given that this is how it really works today.
Arvind Narayanan, Sayash Kapoor and Seth Lazar actually pointed this out several years ago! They write: “In short, its strength is in preventing accidental harms to everyday users. Then, we turn to its weaknesses. We argue that (1) despite its limitations, RLHF continues to be effective in protecting against casual adversaries (2) the fact that skilled and well-resourced adversaries can defeat it is irrelevant, because model alignment is not a viable strategy against such adversaries in the first place. To defend against catastrophic risks, we must look elsewhere.”
They are clearly 100% right, and it’s worth continuing to revisit this point several years later, because it’s still extremely applicable.
Question for the week
When AI agents do work for us, what’s the best way to monitor and discipline them? How do we make sure they faithfully represent our economic interests and don’t take actions that favor the interests of their model companies or economic partners over our own?





