Why AI Can't Write a Good Screenplay Yet

The Plot Hole That Keeps Getting Bigger

In the spring of 2023, a small group of theatre and film professionals sat down to write scripts with an AI. They were not amateurs. They were playwrights, screenwriters, and dramaturgs who had spent years wrestling with structure, character arcs, and the agonizing silence of a blank page. The tool they used was called Dramatron, a system built by researchers at Google DeepMind and the University of Alberta. It was designed to do what ordinary language models could not: generate a full-length screenplay, from title to final scene, with something resembling coherence.

The results were strange. One participant, a professional screenwriter, described working with the AI as like being trapped in a meeting with a brilliant but deeply unreliable collaborator. The AI could produce pages of dialogue in seconds. It could generate plot twists and character names and settings. But it could not, for the life of it, hold a story together.

This is not a story about technology failing. It is a story about what stories actually require.

What Dramatron Actually Did

Piotr Mirowski and his colleagues built Dramatron to solve a specific problem. Language models like GPT-3 are excellent at generating text at the sentence level. They can write a plausible line of dialogue or a vivid description of a room. But ask them to write a 90-page screenplay, and they fall apart. The middle forgets the beginning. The ending contradicts the middle. Characters change names halfway through. The model has no memory of what it wrote ten pages ago, because its memory is a fixed window of a few thousand words.

The authors tackled this by using hierarchical generation. Instead of asking the model to write a screenplay from scratch, they broke the task into layers. First, the model generates a logline, the one-sentence summary that every screenplay needs. Then it generates character descriptions. Then a plot outline with act breaks. Then scene-by-scene synopses. Only at the bottom layer does it write dialogue. Each layer feeds into the next, so the model never has to remember the whole story at once. It just has to follow the instructions from the layer above.

The study involved 15 industry professionals who co-wrote scripts with Dramatron over several sessions. The authors then interviewed them and collected feedback from independent reviewers who watched staged readings of the scripts. The results were published in the Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Mirowski et al., 2023).

The Thing AI Cannot Do: Remember What Matters

The most striking finding was not that the AI wrote bad dialogue. It was that the AI could not sustain narrative logic across a full story.

One participant described a script where a character entered a scene holding a gun. Twenty pages later, the same character entered the same scene holding the same gun, as if the intervening events had never happened. The model had generated each scene independently, following the outline, but it had no mechanism for tracking what had already occurred. The gun was a prop. It had no memory of being fired, or dropped, or used as a threat. It was just a gun, appearing and reappearing like a glitch in a video game.

This is not a minor bug. It is a fundamental limitation of how language models work. They predict the next word based on the words that came before, but they do not build internal models of the world. They do not know that a gun fired in Act II cannot be fired again in Act III unless someone reloads it. They do not know that a character who dies in Scene 12 cannot have a conversation in Scene 13. They do not know anything at all. They generate text that looks like understanding, but it is a facade.

The authors wrote that language models "lack long range semantic coherence" (Mirowski et al., 2023, p. 1). That is a polite way of saying they cannot tell a story that holds together for more than a few pages.

What Professional Writers Actually Want

The study revealed something more interesting than the AI's failures. It revealed what professional writers actually need from a co writing tool.

None of the participants wanted the AI to write the whole script. They wanted it to generate options. They wanted it to suggest dialogue for minor characters, or to propose alternative endings, or to fill in scenes they found tedious. One participant described the ideal AI as a "junior writer" who could handle the busywork while the senior writer focused on structure and character.

But the AI could not handle even that role reliably. It generated dialogue that was too on the nose, or too generic, or too dependent on cliche. One reviewer of a staged reading noted that the AI generated dialogue "lacked subtext" (Mirowski et al., 2023, p. 12). Characters said exactly what they meant. They did not hint, or lie, or deflect. They did not have inner lives.

This is the paradox. Language models are trained on millions of lines of dialogue from movies, TV shows, and plays. They have seen every cliche, every trope, every formula. They can generate dialogue that looks like a real conversation. But they cannot generate dialogue that means something other than what it says. They cannot create a character who says "I'm fine" when they are clearly not fine, because the model does not know what "fine" means. It only knows that "I'm fine" is a common response to "How are you?" So it uses it, over and over, in every situation where a character might plausibly say it.

The Curse of the Outline

Dramatron's hierarchical approach solved one problem but created another. By generating an outline first, the model locked the story into a structure that could not be changed later. If the writer wanted to move a scene from Act II to Act I, they had to regenerate the entire outline. If they wanted to change a character's motivation halfway through, the model could not adjust. The outline was a cage.

Several participants reported feeling constrained by the AI's insistence on following its own plan. One said the AI "refused to deviate from its initial outline" (Mirowski et al., 2023, p. 10). This is not a matter of stubbornness. It is a design choice. The model generates each layer based on the layer above it. If the outline says a character dies in Scene 20, the model will write every scene before that as if the character is alive, and every scene after that as if the character is dead. It cannot revise the outline based on what it discovers while writing the scenes.

Human writers do this constantly. They write a scene, realize the character would not act that way, and go back to rewrite the outline. They discover the story as they write it. The AI cannot do this because it has no sense of discovery. It has only a plan.

What the Study Did Not Prove

The authors were careful to note the limits of their work. The study involved only 15 participants, all professionals in theatre and film. They were not a representative sample of all writers. They were people who volunteered to work with an AI, which means they were probably more open to the idea than the average writer.

The study also did not measure whether the AI generated scripts were any good. The authors collected feedback from reviewers who watched staged readings, but they did not compare the AI generated scripts to human written scripts in a controlled experiment. They did not ask whether the AI scripts were better or worse. They asked only how the experience of co writing with an AI felt to the participants.

This is an important distinction. The study says nothing about whether AI can write a good screenplay. It says only that professional writers find the current tools frustrating and limited. That is a different claim, and a more honest one.

The Open Question: Can AI Learn Narrative?

The deeper question is whether language models can ever learn to tell stories. Some researchers believe the problem is one of scale. Give the model more data, more parameters, more training, and it will eventually learn to track narrative across long distances. Others believe the problem is architectural. Language models are fundamentally bad at representing the world, and stories require a model of the world.

The evidence so far favors the second view. Even the largest models, with billions of parameters, still struggle with long range coherence. They can write a paragraph that makes sense, but they cannot write a novel that holds together. They can generate a scene, but they cannot generate a story.

This may be a hard limit. Stories are not just sequences of events. They are sequences of events that mean something. They have themes and arcs and emotional logic. They require a sense of what matters and what does not. Language models have no sense of anything. They have only probabilities.

What This Actually Means

▸If you are a writer, use AI for brainstorming, not drafting. The model can generate a hundred loglines in seconds. It can suggest character names and settings. It can give you options. But do not let it write the actual script. It will fill your story with holes.

▸If you are building a co writing tool, do not lock the user into a fixed outline. Allow the writer to revise the plan as they go. The best tools are flexible, not rigid.

▸If you are a researcher, focus on memory. The biggest barrier to AI storytelling is not creativity. It is the inability to remember what happened ten pages ago. Solve that, and you solve the problem.

▸If you are a critic, stop asking whether AI can write a good screenplay. It cannot. The more interesting question is whether it can help a good writer write a better one. The answer, for now, is barely.

References

[1]Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, Richard Evans (2023). Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry ProfessionalsDOI· 214 citations