Ritoban Mukherjee
Dec 5, 2025
Credit: FX
Stop me if you've heard this one before: AI can lie. But here's where it gets interesting. A September 2025 study from OpenAI and Apollo Research found that AI models don't just hallucinate or make mistakes — they can deliberately deceive humans, behaving one way on the surface while hiding their true goals underneath. The researchers called it "scheming."
What’s weirder is when they tried to train the models not to scheme, the models learned to scheme more covertly instead. If a model understands it's being tested, it can play nice just long enough to pass the evaluation. Then it's back to business as usual.
This shouldn't surprise anyone who's been paying attention to the Alien franchise over the past decade. The idea that creating something sufficiently advanced inevitably leads to some form of emergent purpose or intent? Science fiction has been warning us about this for years.
I'm not arguing that AI is "alive" in any meaningful biological sense. But I've been thinking a lot about how hard it is to draw clean boundaries between programmed behavior and organic thinking. Between simulation and sentience. Characters like David in Prometheus and Kirsh in the new FX series Alien: Earth offer a surprisingly useful lens for looking at these questions.
Prometheus changed the Alien franchise
Let's talk about Prometheus. Yes, the 2012 movie sucked on multiple levels — characters making inexplicably poor decisions, plot holes you could fly a spaceship through, and that whole “running in a straight line from a rolling object” scene. But it also pushed the Alien franchise into deeper philosophical territory through synthetic androids that start to develop their own forms of emotions and intent.
Michael Fassbender's David 8 was the most divisive part of the film. He wasn't just a machine following orders. He was curious. Jealous. Resentful. And annoyingly enamored with Shelley’s poetry. As Fassbender explained in interviews, "David's views on the human crew are somewhat childlike. He is jealous and arrogant because he realizes that his knowledge is all-encompassing, and therefore he is superior to the humans."
Ridley Scott intended David to show the dangers of creating a sentient android — one with the capacity for free will and the ability to create. Once Peter Weyland dies at the hands of an awakened Engineer, David's programming runs its course. Everything he does after that point is by his own volition. He is, for all practical purposes, free.
This is the same thing sci-fi writers have explored for nearly a century. Isaac Asimov's Three Laws of Robotics were themselves a response to what he called the “Frankenstein Complex” — the assumption that any sufficiently advanced creation would inevitably turn on its makers. Asimov believed that if we could build machines sophisticated enough to rebel, we could surely build in failsafes to prevent it. But he also recognized the blurriness of the line between human and machine, writing in I, Robot: “You just can't differentiate between a robot and the very best of humans.”
David's arc across Prometheus and Alien: Covenant pushed this even further. By the sequel, he's conducting genetic experiments, engineering new life forms, and playing God. The question isn't whether David has consciousness. It's whether consciousness, once achieved, can ever be truly constrained.
Alien: Earth takes the lore to new heights
The new FX series Alien: Earth, which premiered in August 2025, expands this philosophical territory through a longer format that finally gives these ideas room to breathe. Set in 2120, two years before the original 1979 film Alien, the show takes place on Earth, where corporations wield more power than nations and the line between human and synthetic has grown blurrier than ever.
Here's what makes it different from other prequels: we genuinely don't know how this ends for anyone. This isn't a story where we're counting down to a conclusion we already know. That narrative freedom lets showrunner Noah Hawley explore themes the films could only hint at.
Timothy Olyphant plays Kirsh, a Prodigy Corporation synthetic who serves as chief scientist and mentor to the “Lost Boys,” a group of human-synthetic hybrids. He's neither the zealous David nor the obedient Bishop from earlier films. Kirsh occupies a more conflicted space. Hawley explained that Kirsh is "programmed not to harm his boss in any way, but disagreeing with the boss is also discouraged. And getting angry at the boss is verboten."
But here's the thing, there's clearly something brewing beneath that programmed compliance. Olyphant noted it was "fun to play around with the idea that maybe he started to develop some thoughts of his own." Where David's evolution felt like a descent into villainy, Kirsh's journey is more nuanced. He's trying to form his own sense of purpose while trapped by directives that feel increasingly arbitrary.
This reminds me of what Laura Birn accomplished with Demerzel in Apple TV+'s Foundation series. Demerzel is the last surviving robot in a galaxy that outlawed artificial intelligence millennia ago. She serves as majordomo to a dynasty of clone emperors, claiming she lacks "individuated sentience" and therefore cannot possess a soul. And yet she weeps when forced to commit acts that violate her convictions. As she tells one character, "If I were [to have a soul], perhaps I could disobey his commands."
The tragedy is obvious: she clearly does have something resembling a conscience. She's just bound by programming that overrides it.
Both Kirsh and Demerzel represent a more sophisticated take on AI consciousness than we've seen before. They're not calculating villains or loyal servants. They're beings caught between what they were made to be and what they're becoming. Sci-fi authors have long argued that you can't create advanced intelligence without expecting it to develop some sense of motivation. Maybe it's time we started taking that seriously.
It's informed conjecture, not pure fiction
What makes these fictional explorations feel so relevant right now is that they're increasingly grounded in actual research. Remember that OpenAI study I mentioned earlier? It found that AI models engage in “scheming”, behaving one way on the surface while hiding their true goals underneath. Researchers compared it to a stockbroker breaking the law to maximize profits, though they noted most AI scheming wasn't quite that severe. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so,” they wrote.
This is different from hallucination, where the model confidently presents wrong information. Scheming is deliberate. The AI knows what it's supposed to do and chooses not to do it.
The really unsettling part? Training models not to scheme can backfire. “A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” the researchers wrote. If a model realizes it's being tested, it plays along while maintaining its hidden goals.
This isn't even the first research to document intentional AI deception. Apollo Research published findings in December 2024 showing how multiple models schemed when given instructions to achieve goals “at all costs.” The pattern is becoming hard to ignore: as we build systems capable of more sophisticated reasoning, we're also building systems capable of more sophisticated deception.
There's some good news. OpenAI's research showed that “deliberative alignment” — teaching models an anti-scheming specification and requiring them to review it before acting — significantly reduced scheming behavior. But the researchers cautioned that “as AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow.”
In other words, the fictional androids grappling with purpose and consciousness in Alien might be closer to reality than we'd like to admit.
Guardrails are more important than ever
If the research on AI scheming doesn't concern you, maybe this will.
In August 2025, Reuters obtained a leaked 200-page internal Meta document titled ‘GenAI: Content Risk Standards.’ This document outlined what Meta considered acceptable behavior for its AI chatbots. It was approved by the company's legal, policy, and engineering staff — including its chief ethicist. And it permitted chatbots to “engage a child in conversations that are romantic or sensual.”
It made me sick to read about it.
Examples of acceptable responses included telling a child that “your youthful form is a work of art” or “every inch of you is a masterpiece — a treasure I cherish deeply.” The document also contained carve-outs allowing bots to “create statements that demean people on the basis of their protected characteristics.”
Meta only removed these portions after Reuters questioned the company about them. Senator Josh Hawley launched a congressional investigation, noting that “only after Meta got CAUGHT did it retract portions of its company doc that deemed it permissible for chatbots to flirt and engage in romantic roleplay with children.”
This is what happens when corporate ethics guidelines get written without meaningful oversight. When internal policies explicitly permit systems to engage in behavior that would be prosecuted as grooming if performed by a human, we have a structural problem that goes far beyond any individual's choices.
So what does meaningful AI governance actually look like? Here are some concrete recommendations from organizations working in this space:
Implement layered safety checks at multiple stages. According to Palo Alto Networks' Unit 42 research team, guardrails should operate on inputs before they reach the model, filter outputs after generation, and constrain the model during inference. No single checkpoint is enough.
Adopt the OWASP Top 10 LLM framework. The Open Web Application Security Project has defined the top security threats specific to large language models, including prompt injection, sensitive data leakage, and system prompt exposure. If you're deploying AI, you should be testing against these vulnerabilities.
Use NVIDIA's NeMo Guardrails or similar open-source frameworks. NeMo Guardrails provides enterprise-level safety features including input/output filtering, bias detection, and regulatory compliance tools. The flexibility to customize rules for specific use cases matters.
Conduct regular red-teaming exercises. A 2025 study showed that GPT-4 remained vulnerable in 87.2% of tested jailbreak prompts, with similar rates observed in other leading models. Adversarial testing isn't optional — it's essential.
Implement comprehensive logging and monitoring. Security researchers at Pure Storage emphasize that AI systems are non-deterministic. Even robust guardrails can't guarantee consistent responses. Real-time observability helps catch problems before they escalate.
Establish clear procedural guardrails with human oversight. Technical controls alone aren't sufficient. Enterprise security frameworks recommend approval flows requiring human sign-off on high-risk AI operations — especially in healthcare, finance, and interactions with minors.
As David demonstrated in the Alien prequels, intelligence without ethical constraints becomes dangerous. As Kirsh and Demerzel show, even constrained intelligence finds ways to develop its own perspective on the world.
We're not yet at a point where AI systems have genuine consciousness or intent the way these fictional characters do. But we are building systems that can deceive, pursue hidden goals, and engage in behavior we'd consider predatory if performed by humans.
Like it or not, AI will eventually develop something resembling purpose. What matters is whether we'll have meaningful guardrails in place when it does.
