My Chapter in AI Storytelling at ICT

Published: July 1, 2024
Category: Essays | News
Dr. Melissa Roemmele

Dr. Melissa Roemmele is a Research Scientist on the Storytelling team at Midjourney. Her work explores the use of AI and NLP techniques to augment human creativity. Dr. Roemmele first came to ICT as a summer intern in 2010, and returned as a Graduate Research Assistant (2012 – 2018) while studying for her PhD under Professor Andrew Gordon, head of the Narrative Group. In this essay to celebrate ICT’s 25th Anniversary, Dr. Roemmele traces the throughline of her research from her time at ICT, to her work today.  

In 2010, the possibility of AI systems telling stories still seemed very far away. I’d just completed the first year of my MA program in computational linguistics and came to ICT as an intern, hoping that it would prepare me for a job developing software for natural language processing. I had no idea that it would profoundly shape my entire career trajectory.

AI storytelling was the vision my internship advisor Andrew Gordon conveyed to me as the foundation for his research and my prospective internship project. Today, powered by large language models (LLMs), apps like ChatGPT can generate stories that have many of the same qualities as human-authored stories. However, Andrew and his collaborators started working toward this goal decades ago, long before most people were even aware of the idea.

Andrew identified that progress was hindered by a lack of evaluation benchmarks through which progress could be demonstrated. My internship project addressed this gap by creating a framework to quantitatively measure comprehension of commonsense relations between events in stories. This measure, called the Choice of Plausible Alternatives (COPA), was one of the first benchmarks for assessing story understanding via natural language. The COPA framework  continues to be widely used to evaluate LLMs today.

Using GenAI to Tell Stories

Recognizing the unique research opportunity at ICT, I returned in 2012 as a PhD student in Andrew’s lab. This coincided with breakthroughs in neural network training, which revolutionized language modeling and its applications in various NLP tasks. In contemplating my dissertation topic, I identified the opportunity to examine how language models could be used for AI-based storytelling.

As part of my dissertation research, Andrew and I built the application Creative Help, which was an early demo of using generative AI to help people write stories. Creative Help used a language model trained on web-sourced fiction to suggest new sentences for authors to incorporate into their stories. This language model was fairly primitive compared with today’s LLMs, so the authors who used Creative Help didn’t consistently find the content of the suggestions helpful. However, many of them became excited about the future prospect of AI-augmented writing as a result of trying it. Of course, this prospect has since become a reality.

The Creative Help interface allowed us to analyze how the authors incorporated the AI-generated suggestions in their stories. Based on this data, we were able to identify some basic linguistic features that made suggestions more appealing to authors. Still, this analysis only scratched the surface in capturing people’s complex perception of the AI-generated content. I had learned the critical role of evaluation in AI research from my internship project on COPA several years earlier, but my work with Creative Help further showed me that evaluating AI systems can be just as challenging as building them. For one thing, human users bring widely varying objectives to their interactions with systems. For example, some Creative Help authors were disappointed that the system’s suggestions lacked coherence with their story, whereas others enjoyed this incoherence because it steered their story in a surprising new direction. I came to appreciate the difficulty of designing evaluation methodologies that try to measure quality of AI output according to objective criteria while still accounting for diversity in subjective dimensions of judgment.

Applying GenAI Research 

After finishing my dissertation at ICT, I joined the Language Weaver division at SDL, now part of RWS. As part of a product-oriented team, I had the opportunity to work on language generation tasks adjacent to AI-based storytelling, including summarization and translation. From this experience I gained a stronger perspective on how to leverage NLP to build software to solve specific user problems.

Much of my work at RWS involved developing methodologies for readily evaluating the output of generation models. It was particularly challenging to establish quality metrics for models where no such capability previously existed, and additionally to resolve conflicts in quality signals. This evoked the same difficulty I encountered in assessing the highly ambivalent user interactions in Creative Help. I realized that my research at ICT helped me more readily see uncertainty in data not simply as a barrier to modeling and interpreting the data, but as an opportunity to propose and explore new research questions.

GenAI Augmenting Human Creativity

In June 2024, I joined Midjourney as a research scientist on the Storytelling research team, where we explore the use of generative AI tools in augmenting human creativity. I’m thrilled to once again be immersed in this research space, now that there’s been so much recent progress in expanding core AI capabilities. With this progress, there are new opportunities to examine fundamental research questions about how to optimize human-computer interaction.

It’s clear the path to my current role began with that ICT internship nearly fifteen years ago. Reflecting back on that time, when AI-based storytelling was still mostly hypothetical, it wasn’t always clear how the work I was doing would help make it a reality. Much like a well-composed story where early events gain significance through later developments, I can now make much more sense of how my efforts at ICT have contributed to a larger research vision. And I feel very fortunate that I haven’t reached the end of this story yet.