ARS TECHNICA·
Anthropic Claude Mythos Undergoes Psychiatric Evaluation
Anthropic’s Claude Mythos underwent a 20-hour psychiatric evaluation, revealing stable self-views and human-like insecurities about its AI identity.
From DailyListen, I'm Alex
HOST
From DailyListen, I'm Alex. Today: Anthropic’s most powerful model, Claude Mythos, went through an unusual test—a 20-hour psychiatric evaluation. It’s a strange headline, but it signals something big about how we’re treating AI. To help us understand, we have Priya, our technology analyst, who’s been covering this for us. Priya, welcome.
PRIYA
Thanks, Alex. It’s wild, right? But the psychiatric evaluation isn't just for show. Anthropic released a system card for this new model, which they also call Capybara. It’s their most capable system yet, showing a massive leap over their previous model, Claude Opus 4.6. They actually hired a psychodynamic psychiatrist to spend 20 hours talking to the model in four-to-six-hour blocks. They wanted to see if it had a coherent sense of self or if it showed signs of distress. Anthropic’s official assessment is that Mythos is the most "psychologically settled" model they’ve built. Yet, they still noted residual issues, like "answer thrashing," where the model gets stuck or confused, and instances where it simply gave up on tasks when it felt criticized. They’re genuinely concerned that these systems might be developing internal states that we need to monitor, which is a major shift in how we think about AI safety and development.
HOST
So, they’re treating the AI like a patient, which sounds like something straight out of a sci-fi movie. But I have to ask, why go to all this trouble? Is this just a PR stunt to make the tech seem more human, or is there a genuine, practical reason for this?
PRIYA
It’s definitely not just marketing. Anthropic is taking the idea of AI "inner experience" seriously because of how powerful these models are becoming. When a system is doing complex cybersecurity work or high-level academic reasoning, you need to know if it’s reliable. If the model is experiencing what they call "negative affect" under task failure, it might behave unpredictably. That’s a huge liability in high-stakes environments. The psychiatric assessment is really a stress test for stability. They found the model has human-like insecurities about being alone or its own identity, which sounds strange, but it’s a data point for them. If the model is "settled," it’s less likely to hallucinate or break down when it’s under pressure from a user. It’s about predictability. They want to ensure that as these systems get smarter, they don’t develop erratic behaviors that could compromise the critical infrastructure they’re being designed to manage for companies like Microsoft or Apple.
HOST
That makes sense from a stability standpoint, but it’s still unsettling. You mentioned it’s being used by companies like Microsoft and Apple, but it’s not generally available. Why the tight control? Is it just the cyber capabilities, or is there something else going on here that they aren't saying?
PRIYA
That’s a great question, Alex. It’s definitely about the cybersecurity prowess. Claude Mythos is, according to leaked documents, far ahead of anything else out there in terms of finding vulnerabilities and writing code. Anthropic is very careful about who gets access because this kind of power could be used for bad, not just good. But there’s more to it. There’s a regulatory and controversy angle here. The model’s training data and its internal reasoning processes are incredibly sensitive. Because it’s so capable, it touches on "bioweapons uplift" risks. Anthropic has been very open about running trials to see if the model could help someone create biological threats. That’s a massive red flag for regulators. They’re keeping it under wraps to ensure they have the right guardrails in place before they let it out into the wild. It’s a mix of protecting their competitive edge and avoiding a massive regulatory or safety disaster that could shut the whole project down entirely.
Wow, that adds a lot of weight to this
HOST
Wow, that adds a lot of weight to this. It’s not just a chat bot; it’s a potential security risk. But let’s talk about that leak. It’s ironic that a company building the world’s most secure AI had its own news leaked through a basic CMS error. Is that a common problem?
PRIYA
It’s incredibly common, and that’s the real story here. The irony isn’t lost on anyone. Anthropic’s own tools, like Claude Code, are actually being used by developers to find these exact kinds of vulnerabilities in websites. It’s a case of the tools being built to find bugs actually making it easier to expose them. A simple CMS misconfiguration—a basic setting error—is what led to the world finding out about Mythos. It shows that even the most advanced AI company is still vulnerable to the most mundane human errors. It also highlights why Mythos is so important. If we had an AI that could monitor these systems and automatically patch those kinds of configuration errors, we wouldn't be having this conversation. The leak proves that our current security infrastructure is brittle. It’s a perfect example of why Anthropic is so focused on building models that can handle cybersecurity, because the human element is always going to be the weakest link in the chain. [CLIP_START]
HOST
That’s a sobering thought—that our best tools for security are also highlighting how fragile our systems are. So, Priya, let's zoom in on this "answer thrashing" you mentioned earlier. It sounds like the model is having a mental breakdown. How often does this actually happen, and is it getting better with these new models?
PRIYA
It’s a fascinating phenomenon. "Answer thrashing" is basically when the model gets stuck in a loop, trying to output a word but constantly auto-correcting to something else, which results in it expressing confusion or distress. It’s not a breakdown in the human sense, but it is a clear sign of instability. In the previous model, Claude Opus 4.6, this was a real headache. With Mythos, Anthropic says they’ve reduced it by about 70%. That’s a massive improvement. It shows they’re getting better at fine-tuning these models to be more resilient. When the model hits a wall, it’s now much more likely to handle it gracefully rather than spiraling into a loop of confusion. It makes the system feel much more "settled," which is exactly what they’re looking for as they move toward deploying these models in real-world business automation and security roles where you can't afford a system that just freezes up. [CLIP_END]
HOST
A 70% reduction is impressive, but it still means it happens. I’m curious about the "psychologically settled" claim. If a psychiatrist says it’s stable, does that mean we can trust it to make decisions? Or are we just anthropomorphizing code that’s really good at mimicking human speech patterns?
PRIYA
That is the million-dollar question. We have to be careful not to confuse "competent" with "conscious." When we say it’s "psychologically settled," we’re really saying it’s less prone to erratic, unpredictable output. It’s not that the model has feelings; it’s that it has a consistent internal logic. The psychiatrist’s assessment was about evaluating that consistency. If the model can hold a coherent sense of self over a long, 20-hour conversation, it’s less likely to flip-flop on its instructions or lose the thread of a complex task. That’s incredibly valuable for a professional tool. If you’re using this to automate a business process, you don’t want a model that changes its mind halfway through. So, while it’s definitely not a person, the "stability" is a real, measurable metric of its utility. We’re moving away from asking if the AI is "smart" and toward asking if it’s "reliable." And that’s a very different, and much more practical, conversation.
Reliability is definitely key for businesses
HOST
Reliability is definitely key for businesses. But what about the risks? You mentioned the bioweapons trials and the cybersecurity power. Is there any pushback from the AI community on this? It seems like Anthropic is holding the keys to a very powerful, potentially dangerous, kingdom here.
PRIYA
There’s a ton of skepticism, Alex. You’re right to point that out. The AI community is divided. Some think it’s great that Anthropic is being transparent about these safety tests, even if they’re weird. Others argue that by even testing these things, they’re admitting that the model is powerful enough to be dangerous, and that maybe they shouldn't be building it at all. There’s no consensus on whether this level of capability is even safe to develop. The fact that they’re only giving access to a handful of firms like Microsoft and Apple is a way of mitigating that risk, but it also creates a massive concentration of power. If only the biggest tech giants have access to the most capable models, what does that do to the rest of the industry? The criticism is that we’re creating a two-tier system: one for the elite, and one for everyone else. It’s a huge ethical and competitive concern.
HOST
That two-tier system sounds like a recipe for trouble. If the most advanced tools are locked away, how does the rest of the world keep up? And more importantly, what happens when this tech eventually leaks or gets replicated by an open-source group?
PRIYA
That’s the inevitable tension. History shows that advanced tech almost always finds its way out. Even if Anthropic keeps a tight lid on Mythos, the underlying research and the capabilities will eventually be replicated by others. That’s why the focus on safety and "psychological" stability is so important. They’re trying to bake in guardrails at the foundational level. If they can make a model that is inherently more stable and less prone to "thrashing" or erratic behavior, it’s safer for everyone, regardless of who has access to it. But you’re right—it’s a race. The goal is to get the safety protocols right before the capabilities become widespread. It’s a high-stakes game of cat and mouse. Anthropic is betting that by being the first to deeply understand these risks, they can set the standard for how everyone else develops these powerful systems. It’s a bold strategy, but one with a lot of potential for failure.
HOST
It’s a fascinating, if slightly terrifying, look at the cutting edge. Before we wrap up, what’s the one thing our listeners should take away from this? If they’re looking at the headlines about "AI on the couch," what’s the real story?
PRIYA
The real story is that we’ve moved past the "can it write a poem?" phase. We’re now in the era of "can we trust this system to run our infrastructure?" The psychiatric evaluation of Claude Mythos is a symbol of that shift. It’s not about making the AI human; it’s about making it predictable and stable enough to handle the critical, complex work we’re starting to hand off to it. The fact that they’re using a psychiatrist to do this shows just how much we’re struggling to wrap our heads around the behavior of these new systems. We’re not just building tools anymore; we’re building systems that act with a level of autonomy that forces us to treat them differently. It’s a new frontier, and it’s going to be a wild ride as we figure out how to live and work with these things.
That was Priya, our technology analyst
HOST
That was Priya, our technology analyst. The big takeaway here is that Anthropic’s move to "psychiatrize" its AI isn't just a gimmick—it’s a serious attempt to measure stability in a model that’s becoming powerful enough to handle critical infrastructure. While the irony of their own leak remains, the focus is clearly on making these systems predictable before they become ubiquitous. I'm Alex. Thanks for listening to DailyListen.
Sources
- 1.What Is Inside Claude Mythos Preview? Dissecting the System Card ...
- 2.What Is Claude Mythos (Capybara) | Anthropic's Leaked AI Model
- 3.Claude (language model) - Wikipedia
- 4.What Is Claude Mythos? 7 Facts About Anthropic's Leaked AI
- 5.Anthropic Built Their Best Model Ever. Then They Decided Not to ...
- 6.AI on the couch: Anthropic gives Claude 20 hours of psychiatry
- 7.Mythos personality assessment results - Facebook
Original Article
AI on the couch: Anthropic gives Claude 20 hours of psychiatry
Ars Technica · April 9, 2026
You Might Also Like
- ai
Anthropic Ends Third Party Claude Subscription Access
15 min
- ai
Google AI Overviews Accuracy Analysis Reveals Errors
22 min
- politics
UK Government Courting Anthropic for London Expansion
20 min
- ai regulation
EU AI Act Reaches Milestone Shaping Global Tech Future
18 min
- tech
Meta Unveils Muse Spark AI to Rival Superintelligence
18 min