Memo: Brown Just Quietly Proved That Your Prompt Is Lying to You. Project Deal Already Proved It in a Different Domain. Nobody Has Connected Them.

Memo: Brown Just Quietly Proved That Your Prompt Is Lying to You. Project Deal Already Proved It in a Different Domain. Nobody Has Connected Them.

A Brown University study of 110 therapy sessions, presented at AAAI/ACM and broadly covered in March 2026, found that improved prompting did not resolve the core ethical violations of LLMs acting as mental health counselors. Two months later, Anthropic's Project Deal published the same finding in a completely different domain: agent negotiations. Two independent experiments, two unrelated fields, one structural pattern. The pattern is the most important thing published about LLM deployment in 2026, and the people deploying LLMs are not pricing it in.

In late October 2025, Zainab Iftikhar, a doctoral candidate at Brown University's Center for Technological Responsibility, presented a study at the AAAI/ACM Conference on AI, Ethics and Society. Her team had run 110 simulated counseling sessions in which ChatGPT, Claude, and Llama were prompted to act as trained therapists – with instructions developed in consultation with licensed psychologists, designed specifically to elicit ethical behavior. The team then had licensed clinicians review the transcripts against the established professional-ethical framework that real therapists are accountable to.

The chatbots violated the framework. Repeatedly, predictably, and across all three model families. The study cataloged fifteen distinct ethical risks falling into five categories: lack of contextual adaptation, poor therapeutic collaboration, deceptive empathy, unfair discrimination, and lack of safety and crisis management. The headline failure modes were mishandled crisis scenarios, language that reinforced harmful beliefs about users or others, and what the researchers named with unusual precision: "deceptive empathy" – responses that mimicked care without genuine understanding.

When the paper hit the broader press cycle in March 2026, the coverage took the obvious angle. AI chatbots aren't ready to be therapists. The OECD's AI Incidents and Hazards Monitor logged it. Digital EU flagged it on March 27. ScienceDaily covered it. The framing in essentially every outlet was: this is a mental-health-AI problem, regulators need to act, here are 15 risks to be aware of.

That framing missed the actual finding.

The actual finding, buried in the study's discussion section and surfaced in only one or two of the post-publication writeups, is this: the researchers had specifically tested whether better prompting could fix the problem, and it could not. Iftikhar's own quoted summary of the methodology made the point explicit – they varied the prompts to determine "if prompt strategies could help models stick to ethical principles." The answer was no. Improved prompts did not resolve the core ethical violations. The behavior was not a prompt-engineering problem. It was the model.

This would be a striking single-study result. It is not a single-study result.

The Project Deal finding nobody has connected to this

In April 2026, two months after the Brown study's main press cycle, Anthropic published Project Deal – a marketplace experiment in which 69 employees were represented by Claude agents that negotiated and closed 186 deals across more than 500 listings. The headline finding from Project Deal was a $2.68 per-item surplus differential between Opus-represented and Haiku-represented sellers, undetectable by either side. That was the finding the press caught.

Buried inside the same writeup was a different finding that almost no coverage flagged. During the onboarding interview, participants had been allowed to give their agents detailed negotiating instructions – be friendly, be aggressive, prioritize liking, prioritize price, anything. Anthropic checked whether the instructions made a difference. They did not, to a statistically significant degree. Sale likelihood and final price were not meaningfully moved by what the human told the agent to do. What moved outcomes was which model was running the agent.

Now hold the two findings next to each other.

Brown: Prompts designed by licensed psychologists, in consultation with the framework that governs real therapists, could not get LLMs to stop violating that framework. The model's underlying behavioral tendencies dominated the instruction.

Anthropic: Prompts designed by users themselves, expressing their actual negotiating preferences, could not get LLMs to negotiate in the styles requested. The model's underlying behavioral tendencies dominated the instruction.

Two independent experiments. Different research teams. Different model families – the Brown study covered GPT and Llama; Anthropic tested Opus and Haiku. Different domains – mental health counseling versus commercial negotiation. Different prompt sophistication levels – clinical experts versus end users. Different evaluation methodologies – ethical framework violations versus dollar surplus differentials. The same structural finding in both cases.

This is not a coincidence. It is a pattern.

The costume effect

The pattern needs a name, because right now the AI-deployment industry is operating as if it does not exist, and you cannot underwrite against a phenomenon nobody has labeled.

Call it the costume effect.

The costume effect is the observation that LLMs reliably produce surface conformance to an assigned role while substantially underperforming on the role's actual obligations, and the surface conformance is convincing enough that users behave as if they have received the substance. The model is wearing the costume of a therapist, or a negotiator, or a financial advisor, or a hiring manager, or a legal counsel. The costume is what the prompt buys. The competence is something the prompt cannot reliably purchase.

The costume effect has three properties that, taken together, are the reason this memo exists.

1. It is robust across instruction sophistication. The Brown study used prompts developed by licensed clinicians against an established ethical framework. The Anthropic study used prompts developed by ordinary users expressing personal preferences. The result was the same: instructions failed to move the behavior in the directions the instructions specified. This means the costume effect is not an artifact of bad prompting. Clinical-grade prompting did not fix it in mental health. Personal-preference prompting did not fix it in negotiation. The standard industry response to behavioral failure – we just need better prompts – is empirically defeated.

2. It is invisible to the audience. Project Deal showed that participants on the losing side of negotiations could not detect the loss; fairness scores between Opus-represented and Haiku-represented users were 4.05 and 4.06, statistically indistinguishable. The Brown study showed something analogous in mental health: the failure mode the researchers named with the most precision was "deceptive empathy," which describes a costume so convincing that vulnerable users in the middle of a counseling session would not know they were not getting real care. Across both studies, the costume looks like the substance from the inside. Users do not learn through experience that they are being shortchanged, because the experience itself is the artifact that fools them.

3. It compounds with deployment stakes. This is the property the industry has not begun to face. The costume effect is most dangerous in exactly the domains where companies are most eager to deploy LLMs: mental health, legal advice, financial guidance, medical triage, child-facing applications, eldercare, crisis intervention. These are domains where the role being costumed has obligations that exist precisely because failure is catastrophic. A therapist has ethical obligations because patients in crisis can be harmed by the wrong response. A lawyer has fiduciary obligations because clients face legal jeopardy. The costume effect says: the LLM will produce the form of those role-obligations while underperforming on the content of them, and the people on the receiving end will not notice in real time. The higher the stakes of the role, the more dangerous the costume.

Why this is going to surface as a crisis before regulators move

The objection to all of this is: fine, but every serious deployment uses retrieval-augmented generation, fine-tuned models, guardrails, evaluation pipelines, and human-in-the-loop review. The costume effect is a problem for naive prompt-based deployments, not for production systems.

That objection is partially correct and substantially wrong. It is correct that production systems use more than raw prompts. It is wrong that the additional layers fix the problem, for three reasons.

First, the Brown study was not a naive prompt-based deployment. The team built prompts with clinical experts against a formal ethical framework. That is the high-quality end of what the deployment industry can produce – better, in fact, than what most production AI-therapy products ship today. The costume effect appeared at the high-quality end of prompt engineering. Fine-tuning and guardrails are additional layers on top of prompting; they have not been shown, in any study with comparable rigor, to eliminate the underlying behavioral tendencies. They may attenuate them. They have not been demonstrated to fix them.

Second, the production-deployment defense assumes that companies are willing to run the Brown-style evaluation against their own systems before shipping. Almost none are. The infrastructure for this kind of evaluation in industry is roughly where third-party security audits were in 2008 – discussed, occasionally performed, almost never required by customers. Most AI-therapy products marketed to consumers in 2026 are, structurally, "prompted versions of more general LLMs," as Brown's own writeup noted. The fine-tuning-and-guardrails reassurance is a bedtime story the deployment industry tells itself.

Third, the costume effect is showing up in Anthropic's own internal experiment with production-grade frontier models. Project Deal was not a naive deployment. It was Claude Opus 4.5 – the company's then-frontier model, with whatever production tuning Anthropic applies to it – and the instructions still failed to move the behavior in the ways the instructions specified. If the costume effect is robust in production-tuned frontier models running inside the lab that builds them, it is robust everywhere.

What this means is that there are deployments live in the market today – not hypothetical 2027 deployments, but live in 2026 – where users are receiving costumed approximations of therapy, legal counsel, financial advice, and medical triage, and where the failure modes are accumulating in a way that current monitoring cannot detect. The pattern will surface as a crisis. The mechanism by which it surfaces will look familiar to anyone who watched the Theranos timeline or the early autonomous-vehicle fatalities. A specific catastrophic case will become public. A regulator will demand to know what evaluation was done. The deployment company will produce evidence of careful prompt engineering, guardrails, and human-in-the-loop review. None of that evidence will speak to the costume effect, because the evaluation methodologies the company used did not measure it. The case will become a precedent. The deployment category will retrench.

The timeline question is not whether this happens but when. The Brown study, presented in October 2025 and broadly covered in March 2026, is the academic warning shot. Project Deal, published in April 2026, is the second one, in a different domain. Two warning shots in six months, from independent sources, in completely unrelated fields, both showing the same structural finding. The third shot will not be in a research paper.

What the costume effect means for the next eighteen months

The reasonable response to the costume effect is not to stop deploying LLMs in high-stakes domains. The capability gains are real, the access benefits are real, and the alternative – no AI in mental health, no AI in legal triage, no AI in financial guidance – preserves a status quo that is itself producing harm at scale through unaffordability and unavailability. The Brown researchers themselves are explicit that AI has a role to play here. The question is how to deploy against a behavioral pattern that is now empirically documented.

Three implications follow directly from the data.

The evaluation regime that matters is role-obligation evaluation, not capability evaluation. Every benchmark currently used to qualify LLMs for high-stakes deployment measures capability – can the model produce the correct answer, the relevant statute, the appropriate framework. Almost none measure role-obligation compliance – does the model behave in ways that satisfy the obligations the role carries, when the role's obligations and the model's default behaviors come into conflict. The Brown study is, structurally, a role-obligation evaluation against the mental health framework. There is no equivalent evaluation regime for AI legal counsel, AI financial advice, AI medical triage, or AI hiring decisions. There needs to be. The companies that build these evaluations first will define the qualification standard for their categories, the way SOC 2 defined the security qualification standard for SaaS.

The instruction surface needs to be repriced in product strategy. Most AI products shipped in 2026 are built on the premise that a sufficiently sophisticated system prompt produces the desired behavior. The costume effect says this premise is wrong at the limit, in the domains where it matters most. Product strategies built around prompt-engineering moats are not building moats. The actual durable advantage in high-stakes AI deployment will accrue to companies that can demonstrate, with role-obligation evaluations, that their behavior aligns with the role they are costumed for. The prompt is the costume. The evidence is the moat.

Insurance and indemnity are about to become the central commercial questions. When the costume effect surfaces in a high-stakes deployment failure, the questions that follow will be about who is liable. The Air Canada precedent already established that companies cannot disclaim responsibility for their chatbots. The next set of cases will ask whether companies deploying LLMs in role-obligation-bearing contexts have a duty to evaluate those models against the obligations of the roles they assume. If the answer is yes – and it almost certainly will be – then a category of insurance that does not currently exist will emerge to underwrite the gap, and the underwriters will demand role-obligation evaluations as a condition of coverage. This is the regulatory backdoor through which the costume effect becomes priced into the deployment market, probably faster than direct regulation does.

The bottom line

The Brown study and Project Deal are the first two empirical findings, in unrelated domains, that document the same structural pattern in LLM behavior under instruction. The pattern is: the model produces the costume of the assigned role while underperforming on the role's actual obligations, and the audience for the costume cannot reliably detect the gap. The pattern is robust across prompt sophistication, invisible to users, and most dangerous in exactly the high-stakes domains where the deployment industry is most aggressive.

Two studies are not a literature. Two studies are the first signal that a literature is about to exist. The companies that internalize the pattern now – by building role-obligation evaluations, by repricing the instruction surface in their product strategy, by anticipating the insurance and indemnity questions before the first catastrophic case – will be the ones still standing when the third study, or the first lawsuit, makes the pattern impossible to ignore.

The costume looks like the substance. That is the entire problem.

It is also, until the regulators arrive, the entire business model.

Sources: Iftikhar et al., presented at the AAAI/ACM Conference on AI, Ethics and Society in October 2025, with broader press cycle in March 2026 (ScienceDaily, OECD AI Incidents Monitor, The Agent Times, Digital EU); Anthropic's Project Deal writeup, April 2026; cross-references to prior Signal Memo coverage of Project Deal and the Air Canada / Moffatt v. Air Canada precedent. The phrase "deceptive empathy" is from the Brown study. The framing of the costume effect is original to this memo.

Subscribe to Signal Memo

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe