Am I just my demographics? Challenges in Modeling Annotators’ Perspectives

In the field of Data Perspectivism, perspective has emerged as an umbrella term encompassing annotators’ points of view and culturally shaped worldviews. When modeling annotators, researchers have explored a variety of potential predictors, with demographics receiving particular attention, especially following the rise of techniques such as sociodemographic prompting. In this talk, 1 will examine the field’s strong emphasis on annotators’ sociodemographic information and highlight the limitations of this approach. I will focus on challenges in annotator modeling and the complexities of addressing highly subjective linguistic phenomena, going through data collection, modeling and evaluation.

Exploring bias, explaining hate: two critical on detection in studies harm Natural Language Processing

The study of harms in NLP is a fast-evolving field of research, which in a few years has seen the need of considering the subjectivity that characterizes this
phenomenon. In this talk | present two complementary research projects that address this topic from two different perspectives. First, I discuss the systematic presence of bias against women and people with non-Western origin in data filtering strategies for harm reduction in pretraining datasets (Stranisci, & Hardmeier, C., 2025). Then, 1 describe the results of our study on canceling attitudes, whose perception appears to strongly rely on individuals’ moral stance rather than sociodemographic features (Lo, et al,
2025).

Findings from Empirical Studies of Real-world Interactions with LLM-based Conversational Systems

The emergence of large language models has transformed the landscape of conversational systems, but our understanding of how users interact with these systems and what they seek to accomplish remains limited. This talk presents findings from two empirical studies investigating real-world interactions with LLM-based and voice-based conversational systems. The first study analyses over 15,000 prompts submitted to Google Gemini, revealing how users formulate structured, often imperative inputs that go well beyond traditional informational,
navigational,
transactional search intents. This analysis highlights the expanding role of LLMs in supporting complex tasks such as content creation and information extraction. The second study examines over 600,000 interactions with Google Assistant across 173 users, offering insight into voice-based conversational systems’ everyday utility and limitations. The data reveal a predominance of simple instructions and a lack of deeper information-seeking behaviours. Together, these studies offer a nuanced account of user intent, interaction styles, and the evolving role of conversational systems in supporting diverse and situated information needs.

Mining Facebook to Understand the Timeline of Parkinson’s Disease

Parkinson’s disease (PD) is a progressive neurodegenerative disorder with a lengthy prodromal phase that remains difficult to capture using traditional clinical tools. Most monitoring begins only after diagnosis, limiting insight into early symptoms and the lived experience of disease progression. In this talk, I will present work evaluating Facebook as a novel, longitudinal data source for studying PD-related disclosures across the disease timeline
-from years before diagnosis to later stages.