© Private
TheAItre and the Challenges of NLG Evaluation
Patrícia Schmidtová (Charles University)
05. Febuary 2025
from 10:00
Patricia is one of the faces behind TheAltre https://www.theaitre.com/, the LLM project where she generated scripts that real actors performed as a regular theater play in Prague with numerous successful repetitions :https://www.youtube.com/watch?V=8ho5sXiDX_A, back then even with GPT-2!
Nowadays her research mainly focuses on LLM benchmaking and evaluating the quality of generated text:https://aclanthology.org/2024.eacl-long.5/
Patricia’s talk will therefore cover these two topics:
1) Theatre Play Script Generation with GPT
In the first part of my talk, I will discuss the joys and challenges of my master’s research on generating the script of a full-length play using GPT-2. Namely, I will share some of the strategies we used to navigate around the limited context length of the model, getting the characters to have a consistent persona, and above everything else, making the play interesting to watch for the audience.
2) Data Contamination and Other Challenges of NLG Evaluation
In the second part, / will share my ongoing doctoral research on evaluating natural language generation. / will discuss our work on data contamination, present an overview of how NG is evaluated across different specific tasks, and share my challenges of evaluating the semantic accuracy of summarization at a scale when no reference is available.







