Research Group – Data Science & Language Technologies

Colloquium and Talks

Hosted by Prof. Dr. Lucie Flek from the University of Bonn, our NLP Colloquium brings together researchers from diverse fields and institutions across the globe. We provide a platform for in-depth presentations on the latest advancements in Natural Language Processing.

Explore our schedule of upcoming talks or events and become a part of the conversation and community. Register and join us in person or online.

Upcoming Talks

Previous Talks

Am I just my demographics? Challenges in Modeling Annotators’ Perspectives

NLP ColloquiumOctober 9, 2025

In the field of Data Perspectivism, perspective has emerged as an umbrella term encompassing annotators’ points of view and culturally shaped worldviews. When modeling annotators, researchers have explored a variety of potential predictors, with demographics receiving particular attention, especially following the rise of techniques such as sociodemographic prompting. In this talk, 1 will examine the field’s strong emphasis on annotators’ sociodemographic information and highlight the limitations of this approach. I will focus on challenges in annotator modeling and the complexities of addressing highly subjective linguistic phenomena, going through data collection, modeling and evaluation.

Exploring bias, explaining hate: two critical on detection in studies harm Natural Language Processing

NLP ColloquiumAugust 7, 2025

The study of harms in NLP is a fast-evolving field of research, which in a few years has seen the need of considering the subjectivity that characterizes this
phenomenon. In this talk | present two complementary research projects that address this topic from two different perspectives. First, I discuss the systematic presence of bias against women and people with non-Western origin in data filtering strategies for harm reduction in pretraining datasets (Stranisci, & Hardmeier, C., 2025). Then, 1 describe the results of our study on canceling attitudes, whose perception appears to strongly rely on individuals’ moral stance rather than sociodemographic features (Lo, et al,
2025).

Findings from Empirical Studies of Real-world Interactions with LLM-based Conversational Systems

NLP ColloquiumAugust 6, 2025

The emergence of large language models has transformed the landscape of conversational systems, but our understanding of how users interact with these systems and what they seek to accomplish remains limited. This talk presents findings from two empirical studies investigating real-world interactions with LLM-based and voice-based conversational systems. The first study analyses over 15,000 prompts submitted to Google Gemini, revealing how users formulate structured, often imperative inputs that go well beyond traditional informational,
navigational,
transactional search intents. This analysis highlights the expanding role of LLMs in supporting complex tasks such as content creation and information extraction. The second study examines over 600,000 interactions with Google Assistant across 173 users, offering insight into voice-based conversational systems’ everyday utility and limitations. The data reveal a predominance of simple instructions and a lack of deeper information-seeking behaviours. Together, these studies offer a nuanced account of user intent, interaction styles, and the evolving role of conversational systems in supporting diverse and situated information needs.

Mining Facebook to Understand the Timeline of Parkinson’s Disease

NLP ColloquiumAugust 4, 2025

Parkinson’s disease (PD) is a progressive neurodegenerative disorder with a lengthy prodromal phase that remains difficult to capture using traditional clinical tools. Most monitoring begins only after diagnosis, limiting insight into early symptoms and the lived experience of disease progression. In this talk, I will present work evaluating Facebook as a novel, longitudinal data source for studying PD-related disclosures across the disease timeline
-from years before diagnosis to later stages.

Context-Aware Large Language Models for Mental Health Risk Detection

NLP ColloquiumAugust 4, 2025

The increasing burden of mental health disorders-including depression, anxiety, OCD, and suicidal ideation-necessitates the development of advanced Al frameworks capable of interpreting complex emotional signals from language. Our research focuses on context-aware large language models (LLMs) that capture nuanced emotional and psychological patterns embedded in long, unstructured text. These models are designed to preserve semantic coherence and context across sequences, enabling more accurate detection of early mental health risk factors. We introduce a multi-task representation learning approach that integrates subject specific and context-specific features for detecting a range of mental health conditions from both psychiatric and social media texts. This strategy allows for task-specific adaptation while maintaining shared representations, enhancing generalization across related emotional and behavioral tasks. A key aspect of our work involves Hierarchical Explainable Al (XAI), where we employ layered attention mechanisms and graph-based interpretability techniques to identify critical risk-inducing patterns in suicidal and emotionally volatile texts. The framework not only highlights word-level and sentence-level importance but also models higher-order semantic dependencies across text segments, offering transparency in sensitive decision-making contexts. Our current direction explores the use of Explainable Graph Attention Networks and Deep Q-Learning to identify high-risk emotional states and generate context-aware intervention strategies. We further envision the integration of generative Al for producing personalized, real-time supportive responses. Future extensions involve multimodal LLMs that combine text, image, and genetic data for a more holistic understanding of mental health.

Ontologies in Design: How Imagining a Tree Reveals Possibilities and Assumptions in Large Language Models

NLP ColloquiumAugust 4, 2025

Amid the recent uptake of Generative Al, sociotechnical scholars and critics have traced a multitude of resulting harms, with analyses largely focused on values and axiology (e.g., bias). While value- based analyses are crucial, we argue that ontologies-concerning what we allow ourselves to think or talk about-is a vital but under-recognized dimension in analyzing these systems. Proposing a need for a practice-based engagement with ontologies, we offer four orientations for considering ontologies in design: pluralism, groundedness, liveliness, and enactment. We share examples of potentialities that are opened up through these orientations across the entire LLM development pipeline by conducting two ontological analyses: examining the responses of four LLM-based chatbots in a prompting exercise, and analyzing the architecture of an LLM-based agent simulation. We conclude by sharing opportunities and limitations of working with ontologies in the design and development of sociotechnical systems.

Adversarial Text: Detection, Quality Enhancement, and Future Challenges in the LLM Era

NLP ColloquiumJuly 25, 2025

Adversarial text-carefully crafted inputs designed to mislead or degrade the performance of NLP systems-poses a growing challenge across a range of language technologies. In this talk, I will present my work on adversarial text detection and methods for improving the quality and stability of such texts once identified. / will discuss the linguistic and structural characteristics of adversarial inputs, outline current approaches for automatic detection, and introduce techniques for refining adversarial examples to make them more semantically coherent. While the primary focus will be on traditional NLP systems, / will also reflect on how these techniques might evolve to address the emerging complexities of large language models (LLMs). Looking ahead, / will highlight how adversarial methods could be leveraged not only for defence but also as diagnostic tools for probing and improving LLM robustness, interpretability, and trustworthiness.

Bridging Language and Cognition with Computational Models of Morality and Media Framing

NLP ColloquiumJuly 24, 2025

When people comprehend, interpret, or communicate about their environment, they draw on “mental schemata” that encode common knowledge and associations based on experiences, moral values, or beliefs.
New information that aligns with existing mental schemata is much more readily understood and accepted. This talk will present two projects that explore the manifestation of media framing, and moral understanding in humans in LLMs. First, / will introduce “narrative media framing,” a conceptualization of framing grounded in the social sciences that links media framing devices with cognitively salient narrative representations. Secondly, I will present our recent work where we propose a robust method for probing representations of morality in LLMs through word associations.

Understanding Al Sentience

NLP ColloquiumJuly 16, 2025

No artificial intelligence (Al) has yet been scientifically recognized as sentient. However, the concept of “sentient Al” continues to evoke a spectrum of fears-from valid concerns to misconceptions shaped by fiction. To distinguish genuine risks from misperceptions, I introduce a dual-index framework. The Sentience Index measures an Al’s objective sentience-relevant capacities, while the Human Perception Index measures the gap between reality and human perception of Al sentience, shaped by individual and collective narratives. This approach transforms fear into informed action by fostering evidence-based, philosophically grounded discourse on Al sentience and preparing society for its ontological and ethical implications.

NoLiMa: Long-Context Evaluation Beyond Literal Matching

NLP ColloquiumJune 11, 2025

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a “needle” (relevant information) from a “haystack” (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning.
However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens.

Affective Traits of Natural Language

NLP ColloquiumApril 24, 2025

Over the past decade, Natural Language Processing (NLP) has undergone a transformative journey, marked by profound changes, particularly in the development of Large Language Models (LLMs). While some applications of LLMs, such as dialogue agents, have become a common part of our daily lives, their underlying complexities can go unnoticed. This talk focuses on one key aspect of language comprehension-affects. Affective traits encompass factors such as emotions, humor, sarcasm, and moral values, all of which are essential for fully understanding what is being communicated. Our work examines these subtle elements, aiming to enhance the interpretative abilities of LLMs by deepening their understanding of these traits in language, contributing to more meaningful human-machine interactions.

Waking LLMs from CryoSleep with Continual Learning

NLP ColloquiumApril 24, 2025

Large Language Models (LLMs) are often seen as powerful yet static entities, their knowledge frozen after training, disconnected from the ever-evolving world. In this talk, we will explore the challenge of updating these models without retraining them from scratch. We’ll examine current techniques such as fine-tuning, parameter-efficient methods (PEFT), Retrieval-Augmented Generation (RAG), and model editing approaches like Elastic Weight Consolidation (EWC), each with its own trade-offs in scalability, consistency, and memory retention.

Structured Summarization of German Clinical Dialogue in Orthopedy

NLP ColloquiumApril 16, 2025

The integration of machine learning, particularly large language models (LLMs), into medical applications offers great potential to conduct clinical documentation. This study explores the feasibility and effectiveness of generating structured medical letters exclusively from conversational data between physicians and patients. Using only local models such as the whisper speech-to-text models for transcription and local instance of phi-4 for summarization, we aim to automate the creation of clinical documentation while also generating free to use gold standard datasets for future research. The methodology involves recording 100 real-world physician-patient consultations in clinical settings, transcribing the
conversations into text, and generating clinical letters using only local models.

Efficient Language Model Adaptation: Bridging the Gap with Limited Resources

NLP ColloquiumMarch 25, 2025

Large language models (LLMs) have demonstrated remarkable capabilities, but their high computational costs and reliance on extensive labeled data limit their practical deployment in resource-constrained settings. This talk explores strategies for efficiently adapting and leveraging smaller, more deployable models while minimizing reliance on human annotations.

Dynamic Personalization from Cross-model Consistencies

NLP ColloquiumMarch 18, 2025

Scaling up Language Models has led to increasingly advanced capabilities for those who can afford to train them. In order to enable community-tailored models for the rest of us, we will examine cross-model consistencies in how LMs acquire their linguistic knowledge-from fundamental syntax and semantics up to higher-level pragmatic features, such as culture. By identifying these consistencies across different models, we highlight opportunities for how they can enable dynamic personalization approaches that improve the accessibility of language technologies for underserved communities, in which collecting sufficient training data is physically impossible.

Context-Aware Retrieval Augmented Generation Framework

NLP ColloquiumMarch 12, 2025

In this talk, / will present CARAG, a Context-Aware Retrieval Augmented Generation framework that improves Automated Fact Verification (AFV) by incorporating both local and global explanations. Unlike traditional factchecking methods that focus on isolated claims, CARAG leverages thematic embedding aggregation to verify claims in a broader contextual landscape. I will also introduce CARAG-u, an unsupervised extension that eliminates the need for predefined thematic annotations, dynamically deriving contextually relevant evidence clusters from unstructured data. CARAG-u maintains strong performance while increasing adaptability and scalability. Through benchmarks on the FactVer dataset, / will demonstrate how these frameworks enhance explainability and thematic coherence, advancing the role of Al in trustworthy, transparent fact verification.

The Altre and the Challenges of NLG Evaluation

NLP ColloquiumFebruary 5, 2025

In the first part of my talk, I will discuss the joys and challenges of my master’s research on generating the script of a full-length play using GPT-2. Namely, I will share some of the strategies we used to navigate around the limited context length of the model, getting the characters to have a consistent persona, and above everything else, making the play interesting to watch for the audience. In the second part, / will share my ongoing doctoral research on evaluating natural language generation. / will discuss our work on data contamination, present an overview of how NG is evaluated across different specific tasks, and share my challenges of evaluating the semantic accuracy of summarization at a scale when no reference is available.

Al Agents From Foundation to Application

NLP ColloquiumJanuary 24, 2025

In this lecture, we will journey through the core principles of Al agents, building a conceptual bridge from foundational theories to cutting-edge practical implementations. Attendees will gain insights into how autonomous agents operate, starting with basic Al agent architectures and evolving into sophisticated web automation systems. Highlighting our latest research with WebPilot, the lecture will showcase how integrating Monte Carlo Tree Search with a dual optimization strategy addresses the complexities of dynamic web tasks-mitigating vast action spaces and uncertainty through strategic exploration and adaptive decision-making.

How To Train A Multilingual Large Language Model?

NLP ColloquiumJanuary 9, 2025

The Teuken 7B model, a large language model for *European languages*, has recently made the news. If you’re interested in knowing how such models are trained, this week’s speaker is one of the lead scientists who’s done it.
As part of the Lamarr NLP monthly meetings, this week we have the pleasure to host Dr. Mehdi Ali from the Fraunhofer IAIS who will give a guest lecture on How To Train A Multilingual Large Language Model?.

Reliable Evaluation of Interactive LLM Agents in a World of Apps and People: AppWorld

NLP ColloquiumDecember 11, 2024

We envision a world where Al agents (assistants) are widely used for complex tasks in our digital and physical worlds and are broadly integrated into our society. To move towards such a future, we need an environment for a robust evaluation of agents’ capability, reliability, and
trustworthiness.

Understanding and Reasoning in Structured and Symbolic Representations

NLP ColloquiumOctober 9, 2024

This talk outlines my research trajectory in language understanding and reasoning. I begin with event extraction through question-answering techniques, followed by constructing event schemas. Subsequently, I investigate the translation of natural language into symbolic representations to facilitate faithful reasoning. Currently, my work explores training language models using both natural language and knowledge graphs, as well as evaluating narratives through knowledge graphs.

Trustworthy Machine Learning for Al Safety and Al-driven Scientific Discovery

NLP ColloquiumAugust 14, 2024

Machine learning models, while effective in controlled environments, can fail catastrophically when exposed to unexpected conditions upon deployment. This lack of robustness, well-documented even in state-of-the-art models, can lead to severe harm in high-stakes, safety-critical application domains such as healthcare and to bias and inefficiencies in Al-driven scientific discovery. This shortcoming raises a central question: How can we develop machine learning models we can trust?

Diagnosing NLP: Sources of Social Harms of NLP

NLP ColloquiumJuly 24, 2024

The advances in language technologies has seen attempts at addressing increasingly complex tasks such as hate speech detection, in addition to longstanding tasks such as language generation and summarization. However, in spite of the advances and increased public and research attention to such tasks, language technologies broadly still broadly and widely cause social harms such as the propagation of social biases (in increasingly sensitive areas.
In this talk, I will discuss sources of biases and suggested technical interventions, in order to identity whether they address the underlying issues. In particular, I will attend to the political reality of how language technologies are deployed and what their use is. Through this discussion, I hope to highlight pathways for research on language technologies to be used in service of society.

MuZero – Dynamic Learning for LLM Dialog Planning

NLP ColloquiumApril 30, 2024

While large language models (LLMs) perform well on a variety of language-related tasks, they struggle with tasks that require planning. We apply the existing MuZero algorithm to enhance the planning capabilities of LLMs in dialog settings. MuZero uses a neural network to represent observations into a latent space, and then performs Monte Carlo tree search in the latent space using dynamics learned through self-play. We develop a simulated dialog environment to train the MuZero-based model on conversations with a generative LLM such as DialoGPT. We also investigate modifications to the model architecture, such as replacing the representation network by a transformer pretrained on sentence classification. We evaluate our algorithm on realistic multi-turn dialog planning tasks, such as steering the dialog topic to a predefined goal.

Aligning existing information-seeking processes with Conversational Information Seeking And much more

NLP ColloquiumOctober 25, 2023

This talk explores the theoretical aspects of Conversational Information Seeking (CIS) while combining ongoing interaction log analysis and envisioning future research. This talk begins with the core theories underpinning CIS, providing a foundation for the practical insights that follow. The presentation then explores real-world user engagements through interaction log analysis, revealing key patterns and behaviours. The focus shifts to the horizon of information retrieval, with innovative concepts in immersive information seeking. These visionary ideas represent the future of knowledge access.

Contact

M.Sc.
Daria Tomala

Press and public relations
Press Relations / editorial office

Contact

M.A.
Maximilian Waidhas

Press and public relations
Communication Design / editorial office

Contact

B.A.
Viktoria Hytrek

Press and public relations
research assistant

Contact