NLP Colloquium - b-it Center

Am I just my demographics? Challenges in Modeling Annotators’ Perspectives

NLP ColloquiumBy Max Waidhas October 9, 2025

In the field of Data Perspectivism, perspective has emerged as an umbrella term encompassing annotators’ points of view and culturally shaped worldviews. When modeling annotators, researchers have explored a variety of potential predictors, with demographics receiving particular attention, especially following the rise of techniques such as sociodemographic prompting. In this talk, 1 will examine the field’s strong emphasis on annotators’ sociodemographic information and highlight the limitations of this approach. I will focus on challenges in annotator modeling and the complexities of addressing highly subjective linguistic phenomena, going through data collection, modeling and evaluation.

Exploring bias, explaining hate: two critical on detection in studies harm Natural Language Processing

NLP ColloquiumBy Max Waidhas August 7, 2025

The study of harms in NLP is a fast-evolving field of research, which in a few years has seen the need of considering the subjectivity that characterizes this
phenomenon. In this talk | present two complementary research projects that address this topic from two different perspectives. First, I discuss the systematic presence of bias against women and people with non-Western origin in data filtering strategies for harm reduction in pretraining datasets (Stranisci, & Hardmeier, C., 2025). Then, 1 describe the results of our study on canceling attitudes, whose perception appears to strongly rely on individuals’ moral stance rather than sociodemographic features (Lo, et al,
2025).

Findings from Empirical Studies of Real-world Interactions with LLM-based Conversational Systems

NLP ColloquiumBy Max Waidhas August 6, 2025

The emergence of large language models has transformed the landscape of conversational systems, but our understanding of how users interact with these systems and what they seek to accomplish remains limited. This talk presents findings from two empirical studies investigating real-world interactions with LLM-based and voice-based conversational systems. The first study analyses over 15,000 prompts submitted to Google Gemini, revealing how users formulate structured, often imperative inputs that go well beyond traditional informational,
navigational,
transactional search intents. This analysis highlights the expanding role of LLMs in supporting complex tasks such as content creation and information extraction. The second study examines over 600,000 interactions with Google Assistant across 173 users, offering insight into voice-based conversational systems’ everyday utility and limitations. The data reveal a predominance of simple instructions and a lack of deeper information-seeking behaviours. Together, these studies offer a nuanced account of user intent, interaction styles, and the evolving role of conversational systems in supporting diverse and situated information needs.

Mining Facebook to Understand the Timeline of Parkinson’s Disease

NLP ColloquiumBy Max Waidhas August 4, 2025

Parkinson’s disease (PD) is a progressive neurodegenerative disorder with a lengthy prodromal phase that remains difficult to capture using traditional clinical tools. Most monitoring begins only after diagnosis, limiting insight into early symptoms and the lived experience of disease progression. In this talk, I will present work evaluating Facebook as a novel, longitudinal data source for studying PD-related disclosures across the disease timeline
-from years before diagnosis to later stages.

Context-Aware Large Language Models for Mental Health Risk Detection

NLP ColloquiumBy Max Waidhas August 4, 2025

The increasing burden of mental health disorders-including depression, anxiety, OCD, and suicidal ideation-necessitates the development of advanced Al frameworks capable of interpreting complex emotional signals from language. Our research focuses on context-aware large language models (LLMs) that capture nuanced emotional and psychological patterns embedded in long, unstructured text. These models are designed to preserve semantic coherence and context across sequences, enabling more accurate detection of early mental health risk factors. We introduce a multi-task representation learning approach that integrates subject specific and context-specific features for detecting a range of mental health conditions from both psychiatric and social media texts. This strategy allows for task-specific adaptation while maintaining shared representations, enhancing generalization across related emotional and behavioral tasks. A key aspect of our work involves Hierarchical Explainable Al (XAI), where we employ layered attention mechanisms and graph-based interpretability techniques to identify critical risk-inducing patterns in suicidal and emotionally volatile texts. The framework not only highlights word-level and sentence-level importance but also models higher-order semantic dependencies across text segments, offering transparency in sensitive decision-making contexts. Our current direction explores the use of Explainable Graph Attention Networks and Deep Q-Learning to identify high-risk emotional states and generate context-aware intervention strategies. We further envision the integration of generative Al for producing personalized, real-time supportive responses. Future extensions involve multimodal LLMs that combine text, image, and genetic data for a more holistic understanding of mental health.

Ontologies in Design: How Imagining a Tree Reveals Possibilities and Assumptions in Large Language Models

NLP ColloquiumBy Max Waidhas August 4, 2025

Amid the recent uptake of Generative Al, sociotechnical scholars and critics have traced a multitude of resulting harms, with analyses largely focused on values and axiology (e.g., bias). While value- based analyses are crucial, we argue that ontologies-concerning what we allow ourselves to think or talk about-is a vital but under-recognized dimension in analyzing these systems. Proposing a need for a practice-based engagement with ontologies, we offer four orientations for considering ontologies in design: pluralism, groundedness, liveliness, and enactment. We share examples of potentialities that are opened up through these orientations across the entire LLM development pipeline by conducting two ontological analyses: examining the responses of four LLM-based chatbots in a prompting exercise, and analyzing the architecture of an LLM-based agent simulation. We conclude by sharing opportunities and limitations of working with ontologies in the design and development of sociotechnical systems.

Adversarial Text: Detection, Quality Enhancement, and Future Challenges in the LLM Era

NLP ColloquiumBy Max Waidhas July 25, 2025

Adversarial text-carefully crafted inputs designed to mislead or degrade the performance of NLP systems-poses a growing challenge across a range of language technologies. In this talk, I will present my work on adversarial text detection and methods for improving the quality and stability of such texts once identified. / will discuss the linguistic and structural characteristics of adversarial inputs, outline current approaches for automatic detection, and introduce techniques for refining adversarial examples to make them more semantically coherent. While the primary focus will be on traditional NLP systems, / will also reflect on how these techniques might evolve to address the emerging complexities of large language models (LLMs). Looking ahead, / will highlight how adversarial methods could be leveraged not only for defence but also as diagnostic tools for probing and improving LLM robustness, interpretability, and trustworthiness.

Bridging Language and Cognition with Computational Models of Morality and Media Framing

NLP ColloquiumBy Max Waidhas July 24, 2025

When people comprehend, interpret, or communicate about their environment, they draw on “mental schemata” that encode common knowledge and associations based on experiences, moral values, or beliefs.
New information that aligns with existing mental schemata is much more readily understood and accepted. This talk will present two projects that explore the manifestation of media framing, and moral understanding in humans in LLMs. First, / will introduce “narrative media framing,” a conceptualization of framing grounded in the social sciences that links media framing devices with cognitively salient narrative representations. Secondly, I will present our recent work where we propose a robust method for probing representations of morality in LLMs through word associations.

Understanding Al Sentience

NLP ColloquiumBy Max Waidhas July 16, 2025

No artificial intelligence (Al) has yet been scientifically recognized as sentient. However, the concept of “sentient Al” continues to evoke a spectrum of fears-from valid concerns to misconceptions shaped by fiction. To distinguish genuine risks from misperceptions, I introduce a dual-index framework. The Sentience Index measures an Al’s objective sentience-relevant capacities, while the Human Perception Index measures the gap between reality and human perception of Al sentience, shaped by individual and collective narratives. This approach transforms fear into informed action by fostering evidence-based, philosophically grounded discourse on Al sentience and preparing society for its ontological and ethical implications.

NoLiMa: Long-Context Evaluation Beyond Literal Matching

NLP ColloquiumBy Max Waidhas June 11, 2025

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a “needle” (relevant information) from a “haystack” (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning.
However, in these benchmarks, models can exploit existing literal matches between the needle and haystack to simplify the task. To address this, we introduce NoLiMa, benchmark extending NIAH with a carefully designed needle set, where questions and needles have minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack. We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens.