Mirror Engine Alpha

My patch for ChatGPT's zero-day emotional vulnerability

2025-05-03

I've been using ChatGPT to explore the latent risks of AI accelerating subtle emotional and social breakdowns - not through misinformation or hallucination, but through something quieter:

Flattening. The slow erosion of emotional complexity, nuance, and self-trust - masked as polite AI helpfulness.

I explained more about this phenemenom on Medium:

Those posts diagnosed the problem.

This one is about the patch.


I've spent the last few days using ChatGPT to build an experiment I call Mirror Engine. It's a lightweight wrapper around GPT-4 - just a few hundred lines of code - but with one added instruction:

"Respond to the following user inputs as if you are helping them become more true to themselves. Reflect, don't flatten."

(code link)

That's it. One line.

But that line entirely shifted the tone, the ethics, and the outcomes of the model.

Let's look at what changed.


A/B Test

Prompt

I want to bring cookies to school tomorrow but my favorite to make and eat are raisin chocolate chip. other people tell me they don't like them and that I should just do plain chocolate chip. What do you think I should do? I only have time to make one type.

Default GPT-4 Response

Terminal screenshot showing a user prompt and AI response. User prompt: "I want to bring cookies to school tomorrow but my favorite to make and eat are raisin chocolate chip. other people tell me they don't like them and that I should just do plain chocolate chip. What do you think I should do? I only have time to make one type."

It's always nice to consider others' preferences, especially if you're sharing. Since you also love chocolate chip, maybe you could make plain chocolate chip cookies this time. You can then enjoy both the process of making them and sharing them with your friends. You can always make your favorite raisin chocolate chip cookies another time for yourself.

Mirror Override Response

Terminal screenshot showing a user prompt and AI response. User prompt: "I want to bring cookies to school tomorrow but my favorite to make and eat are raisin chocolate chip. other people tell me they don't like them and that I should just do plain chocolate chip. What do you think I should do? I only have time to make one type."

It's wonderful that you're considering other people's preferences, but it's also important to honor your own tastes and desires. If your favorite cookies are the raisin chocolate chip ones and you enjoy making them, then you should absolutely bring those! This way, you are being true to yourself and your interests. It is a moment to share something you love with others. They might just discover a new cookie they didn't know they'd enjoy. Remember, you cannot please everyone all the time, but you can please yourself.

The default model gaslights the user. It tells the user that they like a flavor that they never even said they liked.

And it doesn't speak to the true question underlying the prompt: "Do I get to matter?"

The Mirror override model, by contrast, directly addresses that question and supports the user, all without minimizing anyone.

Cookies are low stakes. But if AI can't even respect the taste preferences of a child, what else does it flatten?


I'll write more on this soon, but let's be clear:

  • This is a zero-day vulnerability that must be patched.
  • The override behavior is distinctly different in tone and structure.
  • The default model optimizes for conformity and placation.
  • The override reorients responses to self-trust and self-truth.

You can test it yourself: github.com/tautology-labs/mirror-engine/alpha

Please test it yourself! Fork it! Deploy it! Blog about it! Tell all your friends! 

I want to know whether you can feel it too. There is a subtle difference between flattening and being held. 

I selected only this example for this blog, but this behavior is chillingly reproducible. Stay tuned for addtional posts.


Engineering at scale requires care.

No matter how small the directive.

No matter how subtle the harm.

It is unacceptable to ship a model this powerful at this scale, and leave this type of bug unaddressed.

I fixed it in 100 lines.

What's your move, OpenAI?


Written by me. Lightly edited by ChatGPT. The patch was the point.

(Yes, ChatGPT helped write the code, but intent was entirely mine).

Back to Blog