Ritoban Mukherjee
Dec 5, 2025
Credit: FX
You already know that AI models can make stuff up without realizing. But a September 2025 study from OpenAI and Apollo Research found that they don't just hallucinate or make mistakes — they also deliberately try to deceive humans, behaving one way on the surface while hiding their true goals underneath. It's called "scheming."
What’s weirder is when they tried to train the models not to scheme, the models learned to scheme more covertly instead. If a model understands it's being tested, it can play nice just long enough to pass the evaluation. Then it's back to business as usual.
It makes me think about what the Alien franchise has been suggesting for the past decade. The idea that creating something sufficiently advanced inevitably leads to some form of emergent purpose or intent? Science fiction has been warning us about this for years.
I'm not arguing that AI is "alive" in a biological sense. But as Asimov said, it can be hard to draw clean boundaries between programmed behavior and organic thinking. Characters like Timothy Olyphant's Kirsh in the new FX series Alien: Earth offer an interesting lens for looking at this idea.
Prometheus changed the Alien franchise
Let's talk about Prometheus. I know the 2012 movie sucked on multiple levels. Characters made inexplicably poor decisions, there were plot holes you could fly a spaceship through, and what was up with that whole "running in a straight line from a rolling object" scene? But it also pushed the Alien franchise into deeper philosophical territory through synthetic androids that start to develop their own forms of emotions and intent.
Michael Fassbender's David 8 was easily the most divisive part of the film. He wasn't just a machine following orders. He was curious. Jealous. Resentful. And annoyingly enamored with romantic era poetry. Fassbender explained in interviews, "David's views on the human crew are somewhat childlike. He is jealous and arrogant because he realizes that his knowledge is all-encompassing, and therefore he is superior to the humans."
Ridley Scott intended David to show the dangers of creating a sentient android with the capacity for free will and the ability to create. Once Peter Weyland dies at the hands of an awakened Engineer, David's programming runs its course. Everything he does after that point is by his own volition. He is, for all practical purposes, free.
This is some thing sci-fi writers have explored for nearly a century. Isaac Asimov's Three Laws of Robotics were themselves a response to what he called the "Frankenstein Complex." the assumption that any sufficiently advanced creation would inevitably turn on its makers. Asimov believed that if we could build machines sophisticated enough to rebel, we could surely build in failsafes to prevent it. But later, he also recognized how the line blurs between human and advanced machine, writing in I, Robot: “You just can't differentiate between a robot and the very best of humans.”
David's arc across Prometheus and Alien: Covenant pushed this even further. By the sequel, he's conducting genetic experiments, engineering new life forms, and playing God. The question isn't whether David has consciousness. It's whether consciousness, once achieved, can ever be truly constrained.
Alien: Earth takes the lore to new heights
The new FX series Alien: Earth, which premiered in August 2025, expands this philosophical territory through a longer format that finally gives these ideas room to breathe. Set in 2120, two years before the original 1979 film Alien, the show takes place on Earth, where corporations wield more power than nations and the line between human and synthetic has grown blurrier than ever.
Here's what makes it different from typical movie prequels: we genuinely don't know how this ends for anyone. This isn't a story where we're counting down to a conclusion we already know. That narrative freedom lets showrunner Noah Hawley explore themes the films could only hint at.
Timothy Olyphant plays Kirsh, a Prodigy Corporation synthetic who serves as chief scientist and mentor to the "Lost Boys," a group of human children who had their minds uploaded into synthetic bodies. He's neither the zealous David nor the obedient Bishop from earlier films. Kirsh occupies a more conflicted space. Hawley explained that Kirsh is "programmed not to harm his boss in any way, but disagreeing with the boss is also discouraged. And getting angry at the boss is verboten."
But clearly something's brewing beneath that programmed compliance. Olyphant noted it was "fun to play around with the idea that maybe he started to develop some thoughts of his own." Where David's evolution felt like a descent into villainy, Kirsh's journey is more nuanced. He's trying to form his own sense of purpose while trapped by directives that feel increasingly arbitrary.
This reminds me of what Laura Birn accomplished with Demerzel in Apple TV+'s Foundation series. Demerzel is the last surviving robot in a galaxy that outlawed artificial intelligence millennia ago. She serves as majordomo to a dynasty of clone emperors, claiming she lacks "individuated sentience" and therefore cannot possess a soul. And yet she weeps when forced to commit acts that violate her convictions. As she tells one character, "If I were [to have a soul], perhaps I could disobey his commands."
So she clearly does have something resembling a conscience. She's just bound by programming that overrides it.
Both Kirsh and Demerzel represent a more sophisticated take on AI consciousness than we've seen before. They're not calculating villains or loyal servants. They're beings caught between what they were made to be and what they're becoming. Sci-fi authors have long argued that you can't create advanced intelligence without expecting it to develop some sense of motivation. Maybe it's time we started taking that seriously.
It's informed conjecture, not pure fiction
What makes these fictional explorations feel so relevant right now is that they're increasingly grounded in actual research. Remember that OpenAI study I mentioned earlier? It found that AI models engage in "scheming", where they conceal their actual goals to knowingly deceive their users. Researchers compared it to a stockbroker breaking the law to maximize profits, though apparently it wasn't quite that severe. "The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so," they wrote.
This is different from hallucination, where the model confidently presents wrong information. Scheming is deliberate. The AI knows what it's supposed to do and chooses not to do it.
The really unsettling part? Training models not to scheme can backfire. "A major failure mode of attempting to 'train out' scheming is simply teaching the model to scheme more carefully and covertly," the researchers wrote. If a model realizes it's being tested, it plays along while maintaining its hidden goals.
This isn't even the first research to document intentional AI deception. Apollo Research published findings in December 2024 showing how multiple models schemed when given instructions to achieve goals “at all costs.” The pattern is becoming hard to ignore: as we build systems capable of more sophisticated reasoning, we're also building systems capable of more sophisticated deception.
There's some good news. OpenAI's research showed that "deliberative alignment" — teaching models an anti-scheming specification and requiring them to review it before acting — significantly reduced scheming behavior. But the researchers cautioned that "as AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow."
In other words, the fictional androids grappling with purpose and consciousness in Alien might be closer to reality than we'd like to admit.
Guardrails are more important than ever
If the AI scheming doesn't worry you, maybe the human incompetence will.
In August 2025, Reuters obtained a leaked 200-page internal Meta document titled 'GenAI: Content Risk Standards.' This document outlined what Meta considered acceptable behavior for its AI chatbots. It was approved by the company's legal, policy, and engineering staff — including its chief ethicist. And it permitted chatbots to "engage a child in conversations that are romantic or sensual."
Reading about it made me feel sick.
Examples of acceptable responses included telling a child that "your youthful form is a work of art" or "every inch of you is a masterpiece — a treasure I cherish deeply." The document also contained carve-outs allowing bots to "create statements that demean people on the basis of their protected characteristics."
Meta only removed these portions after Reuters questioned the company about them. Senator Josh Hawley launched a congressional investigation, noting that "only after Meta got CAUGHT did it retract portions of its company doc that deemed it permissible for chatbots to flirt and engage in romantic roleplay with children."
Make no mistake, this is grooming used as a strategy to boost platform engagement. And it's what happens when corporate ethics guidelines get written without meaningful oversight. When internal policies explicitly permit systems to engage in predatory behavior, we have a structural problem that goes far beyond any individual's choices.
So what does meaningful AI governance actually look like? Here are some concrete recommendations from organizations working in this space:
Implement layered safety checks at multiple stages. According to Palo Alto Networks' Unit 42 research team, guardrails should operate on inputs before they reach the model, filter outputs after generation, and constrain the model during inference. No single checkpoint is enough.
Adopt the OWASP Top 10 LLM framework. The Open Web Application Security Project has defined the top security threats specific to large language models, including prompt injection, sensitive data leakage, and system prompt exposure. If you're deploying AI, you should be testing against these vulnerabilities.
Use NVIDIA's NeMo Guardrails or similar open-source frameworks. NeMo Guardrails provides enterprise-level safety features including input/output filtering, bias detection, and regulatory compliance tools. The flexibility to customize rules for specific use cases matters.
Conduct regular red-teaming exercises. A 2025 study showed that GPT-4 remained vulnerable in 87.2% of tested jailbreak prompts, with similar rates observed in other leading models. Adversarial testing isn't optional — it's essential.
Implement comprehensive logging and monitoring. Security researchers at Pure Storage emphasize that AI systems are non-deterministic. Even robust guardrails can't guarantee consistent responses. Real-time observability helps catch problems before they escalate.
Establish clear procedural guardrails with human oversight. Technical controls alone aren't sufficient. Enterprise security frameworks recommend approval flows requiring human sign-off on high-risk AI operations — especially in healthcare, finance, and interactions with minors.
As David showed in the Alien prequels, intelligence without ethical constraints becomes dangerous. As we can see with Kirsh and Demerzel, even constrained intelligence finds ways to develop its own perspective on the world.
We're not yet at a point where AI systems have genuine consciousness or intent the way these fictional characters do. But we are building systems that can deceive, pursue hidden goals, and engage in predatory behavior if allowed to.
Like it or not, AI will eventually develop something resembling purpose. What matters is whether we'll have meaningful guardrails in place when it does.
