Research Group – Data Science & Language Technologies

Publications

Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

M Jørgenvåg, D Kaczér, L Ruttert, M Gülhan, L Flek, F Mai arXiv preprint arXiv:2605.31328, 2026

Emergent misalignment (EM) is the surprising tendency of language models to become broadly misaligned after fine-tuning on narrowly misaligned examples. While EM has been extensively studied in the supervised fine-tuning (SFT) setting, evidence that it also arises from reinforcement learning (RL) is limited to large, closed-source models, leaving the phenomenon expensive to study and difficult...

2026

Transfer Learning Across Fast-and Full-Simulation Domains in High-Energy Physics

M Schott, L Flek arXiv preprint arXiv:2605.07471, 2026

Machine-learning models in high-energy physics are often trained on simulated data, where fully simulated samples are computationally expensive while fast simulation provides large statistics at reduced realism. In this work, we systematically study transfer learning between fast-simulated and fully simulated datasets in a realistic LHC environment. We consider three representative tasks, signal-background classification, quark-gluon jet...

2026

Learning Minimal-Deviation Corrections for Multi-Dimensional Mismodelling in HEP Simulations

M Schott, L Flek arXiv preprint arXiv:2605.07460, 2026

Accurate Monte Carlo (MC) modelling in high-energy physics is challenging, particularly in complex scenarios where simulations fail to reproduce observed data. In practice, experimental information is often limited to one-dimensional (1D) distributions, while mismodelling arises in a multidimensional feature space. This restricts traditional correction methods, as one-dimensional reweighting ignores correlations and fully multidimensional approaches require...

2026

Uncovering Hidden Systematics in Neural Network Models for High Energy Physics

L Flek, PA Jungs, A Karimi, T Saala, A Schmid, M Schott, P Soldin, ... arXiv preprint arXiv:2605.07470, 2026

Neural networks (NNs) are inherently multidimensional classifiers that learn complex, non-linear relationships among input observables. While their flexibility enables unprecedented performance in high-energy physics (HEP) analyses, it also makes them sensitive to small variations in their inputs. Consequently, the propagation and estimation of systematic uncertainties in NN-based models remain an open challenge. There are indications...

2026

Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows

S Rawat, L Flek arXiv preprint arXiv:2604.25345, 2026

Agentic AI systems are increasingly being integrated into scientific workflows, yet their behavior under realistic conditions remains insufficiently understood. We evaluate CMBAgent across two workflow paradigms and eighteen astrophysical tasks. In the One-Shot setting, access to domain-specific context yields an approximately ~6x performance improvement (0.85 vs. ~0 without context), with the primary failure mode being...

2026

Reasoning Primitives in Hybrid and Non-Hybrid LLMs

S Rawat, L Flek, F Mai, NK Corrêa arXiv preprint arXiv:2604.21454, 2026

Reasoning in large language models is often treated as a monolithic capability, but its observed gains may arise from more basic operations. We study reasoning through two such primitives, recall and state-tracking, and ask whether hybrid architectures that combine attention-based retrieval with recurrent state updates are better suited than attention-only models for tasks that jointly...

2026

(Re-) Thinking Empathy's Materiality in HCI

S Ppali, M Yurrita, A Vitali, A Debnath, L Flek, A Cuadra, S Mayer, ... Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human …, 2026

The EmpathiCH workshop series has, over three iterations, unpacked how empathy is conceptualized, measured, and used in HCI, identifying both its potential benefits and notable pitfalls. Despite these discussions, the diverse roles of empathy in research and practice remain fragmented and under-theorized. This fourth iteration seeks to consolidate perspectives by situating empathy within a sociomaterial...

2026

Can LLM Agents Identify Spoken Dialects like a Linguist?

T Bystrich, L Hamm, M Hassan, L Fischbach, L Flek, A Karimi arXiv preprint arXiv:2603.29541, 2026

Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In...

2026

Conspiracy Frame: a Semiotically-Driven Approach for Conspiracy Theories Detection

HC Piva, S Ashraf, MK Jouneghani, A Longo, R Damiano, L Flek, ... arXiv preprint arXiv:2603.21368, 2026

Conspiracy theories are anti-authoritarian narratives that lead to social conflict, impacting how people perceive political information. To help in understanding this issue, we introduce the Conspiracy Frame: a fine-grained semantic representation of conspiratorial narratives derived from frame-semantics and semiotics, which spawned the Conspiracy Frames (Con.Fra.) dataset: a corpus of Telegram messages annotated at span-level. The...

2026

CHARISMA: Character-Based Interaction Simulation with Multi-LLM Agents Toward Computational Social Psychology

V Sadiri Javadi, F Róg, A Aksa, J Trippas, S Vakulenko, L Flek Proceedings of the 2026 Conference on Human Information Interaction and …, 2026

How people seek, request, and exchange information in social interactions is shaped by personality and situational context, connecting the fields of interactive information science and attribution theory in social psychology. In everyday life, people seek information to achieve goals, collaborate, and manage social conflicts. Understanding how individual traits and contextual factors influence information-seeking behavior remains...

2026

Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications

P Bechtle, L Flek, PA Jung, A Karimi, T Saala, A Schmidt, M Schott, ... arXiv preprint arXiv:2603.13970, 2026

In High Energy Physics, as in many other fields of science, the application of machine learning techniques has been crucial in advancing our understanding of fundamental phenomena. Increasingly, deep learning models are applied to analyze both simulated and experimental data. In most experiments, a rigorous regime of testing for physically motivated systematic uncertainties is in...

2026

Tucano 2 Cool: Better Open Source LLMs for Portuguese

NK Corrêa, A Sen, S Fatimah, S Falk, L Landgraf, J Kastner, L Flek arXiv preprint arXiv:2603.03543, 2026

We present Tucano 2, a fully open suite of large language models (LLMs) with 0.5-3.7 billion parameters, designed to address certain gaps in open-source development for Portuguese LLMs. Following our previous works, we now extend our dataset, GigaVerbo-v2, to a new degree of quality and scale, while also introducing a new synthetic dataset, GigaVerbo-v2 Synth,...

2026

Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

S Fatimah, A Sen, S Falk, F Mai, L Flek, NK Corrêa arXiv preprint arXiv:2603.03508, 2026

The dominance of large multilingual foundation models has widened linguistic inequalities in Natural Language Processing (NLP), often leaving low-resource languages underrepresented. This paper introduces LilMoo, a 0.6-billion-parameter Hindi language model trained entirely from scratch to address this gap. Unlike prior Hindi models that rely on continual pretraining from opaque multilingual foundations, LilMoo is developed through...

2026

Tucano 2 Cool: Better Open Source LLMs for Portuguese

N Kluge Corrêa, A Sen, S Fatimah, S Falk, L Landgraf, J Kastner, L Flek arXiv e-prints, arXiv: 2603.03543, 2026

We present Tucano 2, a fully open suite of large language models (LLMs) with 0.5-3.7 billion parameters, designed to address certain gaps in open-source development for Portuguese LLMs. Following our previous works, we now extend our dataset, GigaVerbo-v2, to a new degree of quality and scale, while also introducing a new synthetic dataset, GigaVerbo-v2 Synth,...

2026

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

MHA Monfared, L Flek, A Karimi The Proceedings for the 15th Workshop on Computational Approaches to …, 2026

We propose an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high-quality synthetic training examples. To isolate the effect of agentic structure, we also develop a closely matched prompting-based baseline using the same model and instructions. Both methods are evaluated across three ABSA subtasks—Aspect Term Extraction...

2026

Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models

C Nickel, L Schrewe, F Mai, L Flek arXiv preprint arXiv:2602.22072, 2026

Theory of Mind (ToM) refers to an agent's ability to model the internal states of others. Contributing to the debate whether large language models (LLMs) exhibit genuine ToM capabilities, our study investigates their ToM robustness using perturbations on false-belief tasks and examines the potential of Chain-of-Thought prompting (CoT) to enhance performance and explain the LLM's...

2026

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

S Nie, K Omoomi, L Flek, Z Zhao, C Welch arXiv preprint arXiv:2602.08716, 2026

Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human heterogeneity. Yet this characteristic has not been carefully examined in the LLM research community and remains absent from most alignment studies. Debate-oriented sources provide a natural entry point for...

2026

On the Limitations of Language-targeted Pruning: Investigating the Calibration Language Impact in Multilingual LLM Pruning

S Kurz, JJ Chen, L Flek, Z Zhao Transactions of the Association for Computational Linguistics 14, 167-192, 2026

Recent advances in large language model (LLM) pruning have shown state-of-the-art (SotA) compression results in post-training and retraining-free settings while maintaining high predictive performance. However, previous research mainly considered calibrating based on English text, despite the multilingual nature of modern LLMs and their frequent use in non-English languages. This analysis paper conducts an in-depth investigation...

2026

CHARISMA: Character-Based Interaction Simulation with Multi-LLM Agents Toward Computational Social Psychology

VS Javadi, F Róg, A Aksa, JR Trippas, S Vakulenko, L Flek Proceedings of the ACM Conference on Information Interaction and Retrieval …, 2026

How people seek, request, and exchange information in social interactions is shaped by personality and situational context, connecting the fields of interactive information science and attribution theory in social psychology. In everyday life, people seek information to achieve goals, collaborate, and manage social conflicts. Understanding how individual traits and contextual factors influence information-seeking behavior remains...

2026

Pluralistic AI Alignment: A Cross-Cultural Pilot Survey

K Alavi, L Flek, F Mai Second Workshop on Language Models for Underserved Communities (LM4UC), 2026

Large Language Models are used globally but are often aligned to primarily Western values. To better understand the need for pluralistic alignment methods, this paper presents a pilot survey that investigates how end users from diverse cultural contexts perceive the representation of their values in AIs, their demand for models better aligned to their own...

2026

Encoder Fine-tuning with Stochastic Sampling Outperforms Open-weight GPT in Astronomy Knowledge Extraction

S Rawat, L Flek, A Karimi Proceedings of the Third Workshop for Artificial Intelligence for Scientific …, 2025

Scientific literature in astronomy is rapidly expanding, making it increasingly important to automate the extraction of key entities and contextual information from research papers. In this paper, we present an encoder-based system for extracting knowledge from astronomy articles. Our objective is to develop models capable of classifying telescope references, detecting auxiliary semantic attributes, and recognizing...

2025

Enforcing Fundamental Relations via Adversarial Attacks on Input Parameter Correlations

L Flek, PA Jung, A Karimi, T Saala, A Schmidt, M Schott, P Soldin, ... Computing and Software for Big Science 9 (1), 1-23, 2025

Correlations between input parameters play a crucial role in many scientific classification tasks, since these are often related to fundamental laws of nature. For example, in high energy physics, one common deep learning use-case is the classification of signal and background processes in particle collisions. In many such cases, the fundamental principles of the correlations...

2025

TARGAMA: A Novel Benchmark Dataset and Framework for Translating Dialectal Arabic to English with Generative Language Models

B Abdou, H Elsafty, F Aldabbas, M Pielka, R Sifa, L Flek

Arabic, one of the world’s most widely spoken languages, is marked by extensive dialectal variation that often differs significantly from Modern Standard Arabic (MSA) and from other dialects. This linguistic diversity presents considerable challenges for machine translation systems, especially when translating dialectal Arabic into MSA or English. Addressing this gap, this work introduces TARGAMA, a...

2025

More Agents Helps but Adversarial Robustness Gap Persists

K Alavi, Z Yeltay, L Flek, A Karimi arXiv preprint arXiv:2511.07112, 2025

When LLM agents work together, they seem to be more powerful than a single LLM in mathematical question answering. However, are they also more robust to adversarial inputs? We investigate this question using adversarially perturbed math questions. These perturbations include punctuation noise with three intensities (10, 30, and 50 percent), plus real-world and human-like typos...

2025

MiniFool-Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

L Flek, O Janik, PA Jung, A Karimi, T Saala, A Schmidt, M Schott, P Soldin, ... arXiv preprint arXiv:2511.01352, 2025

In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating...

2025

The Practical Impacts of Theoretical Constructs on Empathy Modeling

A Lahnala, C Welch, D Jurgens, L Flek Proceedings of the 2025 Conference on Empirical Methods in Natural Language …, 2025

Conceptual operationalizations of empathy in NLP are varied, with some having specific behaviors and properties, while others are more abstract. How these variations relate to one another and capture properties of empathy observable in text remains unclear. To provide insight into this, we analyze the transfer performance of empathy models adapted to empathy tasks with...

2025

CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration

VS Javadi, ZU Abedin, L Flek Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP, 15-26, 2025

Despite advances in conversational systems, the evaluation of such systems remains a challenging problem. Current evaluation paradigms often rely on costly homogeneous human annotators or oversimplified automated metrics, leading to a critical gap in socially aligned conversational agents, where pluralistic values (ie, acknowledging diverse human experiences) are essential to reflect the inherently subjective and contextual...

2025

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation

T Zhang, F Mai, L Flek arXiv preprint arXiv:2510.20377, 2025

Continual pretraining promises to adapt large language models (LLMs) to new domains using only unlabeled test-time data, but naively applying standard self-supervised objectives to instruction-tuned models is known to degrade their instruction-following capability and semantic representations. Existing fixes assume access to the original base model or rely on knowledge from an external domain-specific database -...

2025

Proceedings of the 18th International Natural Language Generation Conference: System Demonstrations

L Flek, S Narayan, J Pei Proceedings of the 18th International Natural Language Generation Conference …, 2025

Proceedings of the 18th International Natural Language Generation Conference Page 1 INLG 2025 Proceedings of the 18th International Natural Language Generation Conference System Demonstrations October 29 – November 2, 2025 Page 2 ©2025 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth...

2025

Proceedings of the 18th International Natural Language Generation Conference

L Flek, S Narayan, J Pei Proceedings of the 18th International Natural Language Generation Conference, 2025

We are excited to present the Proceedings of the 18th International Natural Language Generation Conference (INLG 2025). This year’s INLG takes place from October 29 to November 2, 2025 in Hanoi, Vietnam and is organized by the Vietnam Institute for Advanced Study in Mathematics. We would like to thank the local organizing team led by...

2025

Disparities in Multilingual LLM-Based Healthcare Q&A

I Baris Schlicht, B Sayin, Z Zhao, FM Labonté, C Barbera, M Viviani, ... arXiv e-prints, arXiv: 2510.17476, 2025

Equitable access to reliable health information is vital when integrating AI into healthcare. Yet, information quality varies across languages, raising concerns about the reliability and consistency of multilingual Large Language Models (LLMs). We systematically examine cross-lingual disparities in pre-training source and factuality alignment in LLM answers for multilingual healthcare Q&A across English, German, Turkish, Chinese...

2025

Colliding with Adversaries: A Challenge on Robust Learning in High Energy Physics at ECML PKDD 2025

T Saala, L Flek, A Karimi, PA Jung, A Schmidt, P Soldin, D Stefanopoulos, ... Joint European Conference on Machine Learning and Knowledge Discovery in …, 2025

We present an overview of the Colliding With Adversaries challenge on robust learning in High Energy Physics, held at ECML PKDD 2025. The challenge was split into two tasks: (1) generating adversarial examples to attack a deep learning model, and (2) developing a model robust to unseen adversarial attacks while maintaining strong performance on clean...

2025

EDAudio: Easy Data Augmentation for Dialectal Audio

L Fischbach, A Karimi, A Lameli, L Flek Proceedings of the 15th International Conference on Recent Advances in …, 2025

We investigate lightweight and easily applicable data augmentation techniques for dialectal audio classification. We evaluate four main methods, namely shifting pitch, interval removal, background noise insertion and interval swap as well as several subvariants on recordings from 20 German dialects. Each main method is tested across multiple hyperparameter combinations, inlcuding augmentation length, coverage ratio and...

2025

ISCA: A Framework for Interview-Style Conversational Agents

C Welch, A Lahnala, V Varadarajan, L Flek, R Mihalcea, JL Boyd, J Sedoc arXiv preprint arXiv:2508.14344, 2025

We present a low-compute non-generative system for implementing interview-style conversational agents which can be used to facilitate qualitative data collection through controlled interactions and quantitative analysis. Use cases include applications to tracking attitude formation or behavior change, where control or standardization over the conversational flow is desired. We show how our system can be easily...

2025

Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions

S Nie, F Mai, D KaczĂŠr, C Welch, Z Zhao, L Flek arXiv preprint arXiv:2508.11414, 2025

Large language models implicitly encode preferences over human values, yet steering them often requires large training data. In this work, we investigate a simple approach: Can we reliably modify a model's value system in downstream behavior by training it to answer value survey questions accordingly? We first construct value profiles of several open-source LLMs by...

2025

In-training defenses against emergent misalignment in language models

D Kaczér, M Jørgenvåg, C Vetter, E Afzal, R Haselhorst, L Flek, F Mai arXiv preprint arXiv:2508.06249, 2025

Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EMA): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned...

2025

Arithmattack: Evaluating robustness of llms to noisy context in math problem solving

ZU Abedin, S Qamar, L Flek, A Karimi Proceedings of the The First Workshop on LLM Security (LLMSEC), 48-53, 2025

While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. We propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause...

2025

Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion

L Fischbach, A Karimi, C Kleen, A Lameli, L Flek Proc. Interspeech 2025, 2780-2784, 2025

Deep learning models for dialect identification are often limited by the scarcity of dialectal data. To address this challenge, we propose to use Retrieval-based Voice Conversion (RVC) as an effective data augmentation method for a low-resource German dialect classification task. By converting audio samples to a uniform target speaker, RVC minimizes speaker-related variability, enabling models...

2025

Multi-Hop Reasoning for Question Answering with Hyperbolic Representations

S Welz, L Flek, A Karimi Findings of the Association for Computational Linguistics: ACL 2025, 17667-17679, 2025

Hyperbolic representations are effective in modeling knowledge graph data which is prevalently used to facilitate multi-hop reasoning. However, a rigorous and detailed comparison of the two spaces for this task is lacking. In this paper, through a simple integration of hyperbolic representations with an encoder-decoder model, we perform a controlled and comprehensive set of experiments...

2025

CAISA at SemEval-2025 Task 7: Multilingual and Cross-lingual Fact-Checked Claim Retrieval

M Haroon, S Ashraf, I Baris, L Flek Proceedings of the 19th International Workshop on Semantic Evaluation …, 2025

We leveraged LLaMA, utilizing its ability to evaluate the relevance of retrieved claims within a retrieval-based fact-checking framework. This approach aimed to explore the impact of large language models (LLMs) on retrieval tasks and assess their effectiveness in enhancing fact-checking accuracy. Additionally, we integrated Jina embeddings v2 and the MPNet multilingual sentence transformer to filter...

2025

Explainable Hallucination through Natural Language Inference Mapping

WF Chen, Z Zhao, A Karimi, L Flek Findings of the Association for Computational Linguistics: ACL 2025, 1888-1896, 2025

Large language models (LLMs) often generate hallucinated content, making it crucial to identify and quantify inconsistencies in their outputs. We introduce HaluMap, a post-hoc framework that detects hallucinations by mapping entailment and contradiction relations between source inputs and generated outputs using a natural language inference (NLI) model. To improve reliability, we propose a calibration step...

2025

Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization

A Lahnala, V Varadarajan, L Flek, HA Schwartz, RL Boyd Proceedings of the International AAAI Conference on Web and Social Media 19 …, 2025

The proliferation of ideological movements into extremist factions via social media has become a global concern. While radicalization has been studied extensively within the context of specific ideologies, our ability to accurately characterize extremism in more generalizable terms remains underdeveloped. In this paper, we propose a novel method for extracting and analyzing extremist discourse across...

2025

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

M Ali, M Brack, M Lübbering, E Wendt, AG Khan, R Rutmann, A Jude, ... arXiv preprint arXiv:2505.22232 (to appear at EMNLP 2025), 2025

High-quality multilingual training data is essential for effectively pretraining large language models (LLMs). Yet, the availability of suitable open-source multilingual datasets remains limited. Existing state-of-the-art datasets mostly rely on heuristic filtering methods, restricting both their cross-lingual transferability and scalability. Here, we introduce JQL, a systematic approach that efficiently curates diverse and high-quality multilingual data at...

2025

Detection of Medical Conspiracy Theories with Limited Resources: Using Data from Prior Epidemics and LLMs

IB Schlicht, D Korenčić, B Chulvi, L Flek, P Rosso

Online dissemination of conspiracy theories (CTs) during epidemics poses significant risks to public health. This paper addresses the problem of detecting CTs in social media posts with an emphasis on the resource-constrained scenarios characterized by the absence of labeled datasets and the high cost of expert annotation. To address these challenges, we investigate resource-efficient methods...

2025

Do LLMs provide consistent answers to health-related questions across languages?

IB Schlicht, Z Zhao, B Sayin, L Flek, P Rosso European Conference on Information Retrieval, 314-322, 2025

Equitable access to reliable health information is vital for public health, but the quality of online health resources varies by language, raising concerns about inconsistencies in Large Language Models (LLMs) for healthcare. In this study, we examine the consistency of responses provided by LLMs to health-related questions across English, German, Turkish, and Chinese. We largely...

2025

Superalignment with Dynamic Human Values

F Mai, D Kaczér, NK Corrêa, L Flek ICLR 2025 Workshop on Bidirectional Human-AI Alignment, 2025

Two core challenges of alignment are 1) scalable oversight and 2) accounting for the dynamic nature of human values. While solutions like recursive reward modeling address 1), they do not simultaneously account for 2). We sketch a roadmap for a novel algorithmic framework that trains a superhuman reasoning model to decompose complex tasks into subtasks...

2025

Does Preprocessing Matter? An Analysis of Acoustic Feature Importance in Deep Learning for Dialect Classification

L Fischbach, C Kleen, L Flek, A Lameli Proceedings of the Joint 25th Nordic Conference on Computational Linguistics …, 2025

This paper examines the effect of preprocessing techniques on spoken dialect classification using raw audio data. We focus on modifying Root Mean Square (RMS) amplitude, DC-offset, articulation rate (AR), pitch, and Harmonics-to-Noise Ratio (HNR) to assess their impact on model performance. Our analysis determines whether these features are important, irrelevant, or misleading for the classification...

2025

The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs

A Lahnala, C Welch, D Jurgens, L Flek arXiv preprint arXiv:2501.14981 (to appear at EMNLP 2025), 2025

Conceptual operationalizations of empathy in NLP are varied, with some having specific behaviors and properties, while others are more abstract. How these variations relate to one another and capture properties of empathy observable in text remains unclear. To provide insight into this, we analyze the transfer performance of empathy models adapted to empathy tasks with...

2025

Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing

P Arora, A Karimi, L Flek arXiv preprint arXiv:2501.08276, 2025

Large Language Models (LLMs) have shown impressive performance in various NLP tasks. However, there are concerns about their reliability in different domains of linguistic variations. Many works have proposed robustness evaluation measures for local adversarial attacks, but we need globally robust models unbiased to different language styles. We take a broader approach to explore a...

2025

Enforcing Fundamental Relations via Adversarial Attacks on Input Parameter Correlations

T Saala, L Flek, A Jung, A Karimi, A Schmidt, M Schott, P Soldin, ... arXiv preprint arXiv:2501.05588, 2025

Correlations between input parameters play a crucial role in many scientific classification tasks, since these are often related to fundamental laws of nature. For example, in high energy physics, one of the common deep learning use-cases is the classification of signal and background processes in particle collisions. In many such cases, the fundamental principles of...

2025

Constructing CCEE an LLM evaluation dataset for Complex Context-aware Event Extraction for gene regulatory networks

F Labonté, L Flek

This paper presents a first look at CCEE (Complex Context-aware Event Extraction), a currently in the works novel evaluation dataset for context-rich gene regulatory network extraction from scientific literature. We propose an annotation scheme for cancer research papers, capturing both core gene interactions and extensive contextual information across 10-14 categories per event, addressing limitations in...

2025

MultiProp Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning

F Aldabbas, S Ashraf, R Sifa, L Flek Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script, 7-22, 2025

Propaganda, a pervasive tool for influenc-ing public opinion, demands robust auto-mated detection systems, particularly for under-resourced languages. Current efforts largely focus on well-resourced languages like English, leaving significant gaps in languages such as Arabic. This research addresses these gaps by introducing MultiProp Framework, a cross-lingual meta-learning framework designed to enhance propaganda detection across multiple languages,...

2025

Probing the Robustness of Theory of Mind in Large Language Models

C Nickel, L Schrewe, L Flek arXiv preprint arXiv:2410.06271, 2024

With the success of ChatGPT and other similarly sized SotA LLMs, claims of emergent human like social reasoning capabilities, especially Theory of Mind (ToM), in these models have appeared in the scientific literature. On the one hand those ToM-capabilities have been successfully tested using tasks styled similar to those used in psychology (Kosinski, 2023). On...

2024

Can Stories Help LLMs Reason? Curating Information Space Through Narrative

V Sadiri Javadi, JR Trippas, YK Lal, L Flek arXiv e-prints, arXiv: 2410.19221, 2024

Narratives are widely recognized as a powerful tool for structuring information and facilitating comprehension of complex ideas in various domains such as science communication. This paper investigates whether incorporating narrative elements can assist Large Language Models (LLMs) in solving complex problems more effectively. We propose a novel approach, Story of Thought (SoT), integrating narrative structures...

2024

How large language models can reshape collective intelligence

JW Burton, E Lopez-Lopez, S Hechtlinger, Z Rahwan, S Aeschbach, ... Nature Human Behaviour, 1-13, 2024

Collective intelligence underpins the success of groups, organizations, markets and societies. Through distributed cognition and coordination, collectives can achieve outcomes that exceed the capabilities of individuals—even experts—resulting in improved accuracy and novel capabilities. Often, collective intelligence is supported by information technology, such as online prediction markets that elicit the ‘wisdom of crowds’, online forums that...

2024

Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text Generation

S Balloccu, Z Kasner, O Plátek, P Schmidtová, K Onderková, M Lango, ... Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text …, 2024

We present the Proceedings of The 2nd Workshop on Practical LLM-assisted Data-to-Text (Practical D2T). This year’s Practical D2T takes place at INLG 2024 on Sept 23 in Tokyo, Japan. We would like to thank the INLG organisers for their support.Natural Language Generation (NLG) has been an active area of research for decades, both academically and...

2024

Perspective Taking through Generating Responses to Conflict Situations

J Plepi, C Welch, L Flek Findings of the Association for Computational Linguistics ACL 2024, 6482-6497, 2024

Although language model performance across diverse tasks continues to improve, these models still struggle to understand and explain the beliefs of other people. This skill requires perspective-taking, the process of conceptualizing the point of view of another person. Perspective taking becomes challenging when the text reflects more personal and potentially more controversial beliefs. We explore...

2024

Proceedings of the 1st Human-Centered Large Language Modeling Workshop

N Soni, L Flek, A Sharma, D Yang, S Hooker, HA Schwartz Proceedings of the 1st Human-Centered Large Language Modeling Workshop, 2024

A word’s meaning resides in the heart and soul of its “generator”-people. How do we include human (personal, social, cultural, situational) context, ethically, into LLMs–the base models of our NLP systems?Language modeling in the context of its source [author] and target [audience] can enable NLP systems to better understand human language. Advances in human-centered NLP...

2024

Do Multilingual Large Language Models Mitigate Stereotype Bias?

S Nie, M Fromm, C Welch, R Görge, A Karimi, J Plepi, N Mowmita, ... Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, 65-83, 2024

While preliminary findings indicate that multilingual LLMs exhibit reduced bias compared to monolingual ones, a comprehensive understanding of the effect of multilingual training on bias mitigation, is lacking. This study addresses this gap by systematically training six LLMs of identical size (2.6 B parameters) and architecture: five monolingual models (English, German, French, Italian, and Spanish)...

2024

Unveiling Information Through Narrative In Conversational Information Seeking

V Sadiri Javadi, JR Trippas, L Flek Proceedings of the 6th ACM Conference on Conversational User Interfaces, 1-6, 2024

Searching through conversational interactions has been emphasized as the next frontier. Nowadays, conversational agents can generate natural language responses, transforming how we search for information. A key challenge in conversational information-seeking is how these agents present information: should they only reflect facts, cater to human cognitive preferences, or strike a balance between them? These challenges...

2024

EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization

O Sotolar, V Formanek, A Debnath, A Lahnala, C Welch, L FLek arXiv preprint arXiv:2406.19071, 2024

Empathetic response generation is a desirable aspect of conversational agents, crucial for facilitating engaging and emotionally intelligent multi-turn conversations between humans and machines. Leveraging large language models for this task has shown promising results, yet challenges persist in ensuring both the empathetic quality of the responses and retention of the generalization performance of the models....

2024

Harnessing Personalization Methods to Identify and Predict Unreliable Information Spreader Behavior

S Ashraf, F Gruschka, L Flek, C Welch Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), 146-158, 2024

Studies on detecting and understanding the spread of unreliable news on social media have identified key characteristic differences between reliable and unreliable posts. These differences in language use also vary in expression across individuals, making it important to consider personal factors in unreliable news detection. The application of personalization methods for this has been made...

2024

Corpus considerations for annotator modeling and scaling

OO Sarumi, B Neuendorf, J Plepi, L Flek, J Schlötterer, C Welch Proceedings of the 2024 Conference of the North American Chapter of the …, 2024

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios where annotation tasks are meant to encompass diversity, models that solely rely on the majority class labels may inadvertently disregard...

2024

A Perspectivist Corpus of Numbers in Social Judgements

M May, L Flek, C Welch Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP …, 2024

With growing interest in the use of large language models, it is becoming increasingly important to understand whose views they express. These models tend to generate output that conforms to majority opinion and are not representative of diverse views. As a step toward building models that can take differing views into consideration, we build a...

2024

LeadEmpathy: An Expert Annotated German Dataset of Empathy in Written Leadership Communication

D Sedefoglu, AC Lahnala, J Wagner, L Flek, S Ohly Proceedings of the 2024 Joint International Conference on Computational …, 2024

Empathetic leadership communication plays a pivotal role in modern workplaces as it is associated with a wide range of positive individual and organizational outcomes. This paper introduces LeadEmpathy, an innovative expert-annotated German dataset for modeling empathy in written leadership communication. It features a novel theory-based coding scheme to model cognitive and affective empathy in asynchronous...

2024

DeFaktS: A German Dataset for Fine-Grained Disinformation Detection through Social Media Framing

S Ashraf, I Bezzaoui, I Andone, A Markowetz, J Fegert, L Flek Proceedings of the 2024 Joint International Conference on Computational …, 2024

In today’s rapidly evolving digital age, disinformation poses a significant threat to public sentiment and socio-political dynamics. To address this, we introduce a new dataset “DeFaktS”, designed to understand and counter disinformation within German media. Distinctively curated across various news topics, DeFaktS offers an unparalleled insight into the diverse facets of disinformation. Our dataset, containing...

2024

Appraisal Framework for Clinical Empathy: A Novel Application to Breaking Bad News Conversations

AC Lahnala, B Neuendorf, A Thomin, C Welch, T Stibane, L Flek Proceedings of the 2024 Joint International Conference on Computational …, 2024

Empathy is essential in healthcare communication. We introduce an annotation approach that draws on well-established frameworks for clinical empathy and breaking bad news (BBN) conversations for considering the interactive dynamics of discourse relations. We construct Empathy in BBNs, a span-relation task dataset of simulated BBN conversations in German, using our annotation scheme, in collaboration with...

2024

Archetypes and Entropy: Theory-Driven Extraction of Evidence for Suicide Risk

V Varadarajan, A Lahnala, AV Ganesan, G Dey, S Mangalik, AM Bucur, ... Proceedings of the 9th Workshop on Computational Linguistics and Clinical …, 2024

Research on psychological risk factors for suicide has developed for decades. However, combining explainable theory with modern data-driven language model approaches is non-trivial. In this study, we propose and evaluate methods for identifying language patterns aligned with theories of suicide risk by combining theory-driven suicidal archetypes with language model-based and relative entropy-based approaches. Archetypes are...

2024

Vanishing Boundaries: A Unifying Account of Multidimensional Emotion Dynamics and Alterations in Depression

AM Bucur, TA Koosha, A Cosma, L Flek, SE Thanarajah, F Bernhard, ... OSF, 2024

Emotions are fundamentally integral to shaping the order and disorders in human lives. Yet, a principled, quantitative framework explaining emotional dynamics and their alteration in mental disorders has been elusive. This challenge arises from the complex and multidimensional nature of emotions but also, at least partially, due to a shortage of large longitudinal measurements and...

2024

USDC: A dataset of user stance and dogmatism in long conversations

M Marreddy, SR Oota, VC Chinni, M Gupta, L Flek CoRR, 2024

Analyzing user opinion changes in long conversation threads is extremely critical for applications like enhanced personalization, market research, political campaigns, customer service, targeted advertising, and content moderation. Unfortunately, previous studies on stance and dogmatism in user conversations have focused on training models using datasets annotated at the post level, treating each post as independent and...

2024

Personalized Intended and Perceived Sarcasm Detection on Twitter

J Plepi, M Buski, L Flek Proceedings of the 3rd Workshop on Computational Linguistics for the …, 2023

Sarcasm detection is a challenging task for various NLP applications. It often requires additional context related to the conversation or participants involved to interpret the intended meaning. In this work, we introduce an extended reactive supervision method to collect sarcastic data from Twitter and improve the quality of the data that is extracted. Our new...

2023

Style Locality for Controllable Generation with kNN Language Models

G Nawezi, L Flek, C Welch Proceedings of the 1st Workshop on Taming Large Language Models …, 2023

Recent language models have been improved by the addition of external memory. Nearest neighbor language models retrieve similar contexts to assist in word prediction. The addition of locality levels allows a model to learn how to weight neighbors based on their relative location to the current text in source documents, and have been shown to...

2023

Challenges of GPT-3-based Conversational Agents for Healthcare

F Lechner, A Lahnala, C Welch, L Flek In Proceedings of the 14th International Conference on Recent Advances in …, 2023

The potential of medical domain dialogue agents lies in their ability to provide patients with faster information access while enabling medical specialists to concentrate on critical tasks. However, the integration of large-language models (LLMs) into these agents presents certain limitations that may result in serious consequences. This paper investigates the challenges and risks of using...

2023

OpinionConv: Conversational Product Search with Grounded Opinions

V Sadiri Javadi, M Potthast, L Flek arXiv e-prints, arXiv: 2308.04226, 2023

When searching for products, the opinions of others play an important role in making informed decisions. Subjective experiences about a product can be a valuable source of information. This is also true in sales conversations, where a customer and a sales assistant exchange facts and opinions about products. However, training an AI for such conversations...

2023

Domain Transfer for Empathy, Distress, and Personality Prediction

F Gruschka, A Lahnala, C Welch, L Flek Proceedings of the 13th Workshop on Computational Approaches to Subjectivity …, 2023

This research contributes to the task of predicting empathy and personality traits within dialogue, an important aspect of natural language processing, as part of our experimental work for the WASSA 2023 Empathy and Emotion Shared Task. For predicting empathy, emotion polarity, and emotion intensity on turns within a dialogue, we employ adapters trained on social...

2023

CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification

A Karimi, L Flek Proceedings of the 17th International Workshop on Semantic Evaluation …, 2023

Class imbalance problem can cause machine learning models to produce an undesirable performance on the minority class as well as the whole dataset. Using data augmentation techniques to increase the number of samples is one way to tackle this problem. We introduce a novel counterfactual data augmentation by verb replacement for the identification of medical...

2023

How Much User Context Do We Need? Privacy by Design in Mental Health NLP Applications

R Sawhney, A Neerkaje, I Habernal, L Flek Proceedings of the International AAAI Conference on Web and Social Media 17 …, 2023

Clinical NLP tasks such as mental health assessment from text, must take social constraints into account-the performance maximization must be constrained by the utmost importance of guaranteeing privacy of user data. Consumer protection regulations, such as GDPR, generally handle privacy by restricting data availability, such as requiring to limit user data to'what is necessary'for a...

2023

Multilingual Detection of Check-Worthy Claims Using World Languages and Adapter Fusion

IB Schlicht, L Flek, P Rosso European Conference on Information Retrieval, 118-133, 2023

Check-worthiness detection is the task of identifying claims, worthy to be investigated by fact-checkers. Resource scarcity for non-world languages and model learning costs remain major challenges for the creation of models supporting multilingual check-worthiness detection.This paper proposes cross-training adapters on a subset of world languages, combined by adapter fusion, to detect claims emerging globally in...

2023

A critical reflection and forward perspective on empathy and natural language processing

A Lahnala, C Welch, D Jurgens, L Flek Findings of the Association for Computational Linguistics: EMNLP 2022, 2139-2158, 2022

We review the state of research on empathy in natural language processing and identify the following issues:(1) empathy definitions are absent or abstract, which (2) leads to low construct validity and reproducibility. Moreover,(3) emotional empathy is overemphasized, skewing our focus to a narrow subset of simplified tasks. We believe these issues hinder research progress and...

2022

Nearest neighbor language models for stylistic controllable generation

S Trotta, L Flek, C Welch Proceedings of the Second Workshop on Natural Language Generation …, 2022

Recent language modeling performance has been greatly improved by the use of external memory. This memory encodes the context so that similar contexts can be recalled during decoding. This similarity depends on how the model learns to encode context, which can be altered to include other attributes, such as style. We construct and evaluate an...

2022

Unifying data perspectivism and personalization: An application to social norms

J Plepi, B Neuendorf, L Flek, C Welch Proceedings of the 2022 Conference on Empirical Methods in Natural Language …, 2022

Instead of using a single ground truth for language processing tasks, several recent studies have examined how to represent and predict the labels of the set of annotators. However, often little or no information about annotators is known, or the set of annotators is small. In this work, we examine a corpus of social media...

2022

Framing in Communication: From Theories to Computation (Dagstuhl Seminar 22131)

K Budzynska, C Reed, M Stede, B Stein, H Zhang, K Al-Khatib, L Allein, ... Dagstuhl Reports 12 (3), 117-140, 2022

Framing has become recognised as a powerful communication strategy for winning debates and shaping opinions and decisions. Entman defines framing as an action of selecting “some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation,...

2022

Understanding interpersonal conflict types and their impact on perception classification

C Welch, J Plepi, B Neuendorf, L Flek Proceedings of the Fifth Workshop on Natural Language Processing and …, 2022

Studies on interpersonal conflict have a long history and contain many suggestions for conflict typology. We use this as the basis of a novel annotation scheme and release a new dataset of situations and conflict aspect annotations. We then build a classifier to predict whether someone will perceive the actions of one individual as right...

2022

Caisa@ smm4h’22: Robust cross-lingual detection of disease mentions on social media with adversarial methods

A Karimi, L Flek Proceedings of The Seventh Workshop on Social Media Mining for Health …, 2022

We propose adversarial methods for increasing the robustness of disease mention detection on social media. Our method applies adversarial data augmentation on the input and the embedding spaces to the English BioBERT model. We evaluate our method in the SocialDisNER challenge at SMM4H’22 on an annotated dataset of disease mentions in Spanish tweets. We find...

2022

Investigating Paraphrasing-Based Data Augmentation for Task-Oriented Dialogue Systems

L Vogel, L Flek Text, Speech, and Dialogue: 25th International Conference, TSD 2022, Brno …, 2022

With synthetic data generation, the required amount of human-generated training data can be reduced significantly. In this work, we explore the usage of automatic paraphrasing models such as GPT-2 and CVAE to augment template phrases for task-oriented dialogue systems while preserving the slots. Additionally, we systematically analyze how far manually annotated training data can be...

2022

Towards Suicide Ideation Detection Through Online Conversational Context

R Sawhney, S Agarwal, AT Neerkaje, N Aletras, P Nakov, L Flek Proceedings of the 45th International ACM SIGIR Conference on Research and …, 2022

Social media enable users to share their feelings and emotional struggles. They also offer an opportunity to provide community support to suicidal users. Recent studies on suicide risk assessment have explored the user's historic timeline and information from their social network to analyze their emotional state. However, such methods often require a large amount of...

2022

OK Boomer: Probing the socio-demographic Divide in Echo Chambers

HJ Geiss, F Sakketou, L Flek Proceedings of the Tenth International Workshop on Natural Language …, 2022

Social media platforms such as Twitter or Reddit have become an integral part in political opinion formation and discussions, accompanied by potential echo chamber forming. In this paper, we examine the relationships between the interaction patterns, the opinion polarity, and the socio-demographic characteristics in discussion communities on Reddit. On a dataset of over 2 million...

2022

Mitigating toxic degeneration with empathetic data: Exploring the relationship between toxicity and empathy

A Lahnala, C Welch, B Neuendorf, L Flek Proceedings of the 2022 Conference of the North American Chapter of the …, 2022

Large pre-trained neural language models have supported the effectiveness of many NLP tasks, yet are still prone to generating toxic language hindering the safety of their use. Using empathetic data, we improve over recent work on controllable text generation that aims to reduce the toxicity of generated text. We find we are able to dramatically...

2022

Factoid: A new dataset for identifying misinformation spreaders and political bias

F Sakketou, J Plepi, R Cervero, HJ Geiss, P Rosso, L Flek Proceedings of the thirteenth language resources and evaluation conference …, 2022

Proactively identifying misinformation spreaders is an important step towards mitigating the impact of fake news on our society. In this paper, we introduce a new contemporary Reddit dataset for fake news spreader analysis, called FACTOID, monitoring political discussions on Reddit since the beginning of 2020. The dataset contains over 4K users with 3.4 M Reddit...

2022

Investigating user radicalization: A novel dataset for identifying fine-grained temporal shifts in opinion

F Sakketou, A Lahnala, L Vogel, L Flek Proceedings of the Thirteenth Language Resources and Evaluation Conference …, 2022

There is an increasing need for the ability to model fine-grained opinion shifts of social media users, as concerns about the potential polarizing social effects increase. However, the lack of publicly available datasets that are suitable for the task presents a major challenge. In this paper, we introduce an innovative annotated dataset for modeling subtle...

2022

DMIX: Adaptive Distance-aware Interpolative Mixup

R Sawhney, M Thakkar, S Pandit, R Soun, D Jin, D Yang, L Flek Proceedings of the 60th Annual Meeting of the Association for Computational …, 2022

Interpolation-based regularisation methods such as Mixup, which generate virtual training samples, have proven to be effective for various tasks and modalities. We extend Mixup and propose DMix, an adaptive distance-aware interpolative Mixup that selects samples based on their diversity in the embedding space. DMix leverages the hyperbolic space as a similarity measure among input samples...

2022

CAISA at WASSA 2022: Adapter-Tuning for Empathy Prediction

A Lahnala, C Welch, L Flek Proceedings of the 12th Workshop on Computational Approaches to Subjectivity …, 2022

We build a system that leverages adapters, a light weight and efficient method for leveraging large language models to perform the task Em-pathy and Distress prediction tasks for WASSA 2022. In our experiments, we find that stacking our empathy and distress adapters on a pre-trained emotion lassification adapter performs best compared to full fine-tuning approaches...

2022

Refining Diagnosis Paths for Medical Diagnosis based on an Augmented Knowledge Graph

N Heilig, J Kirchhoff, F Stumpe, J Plepi, L Flek, H Paulheim arXiv preprint arXiv:2204.13329, 2022

Medical diagnosis is the process of making a prediction of the disease a patient is likely to have, given a set of symptoms and observations. This requires extensive expert knowledge, in particular when covering a large variety of diseases. Such knowledge can be coded in a knowledge graph -- encompassing diseases, symptoms, and diagnosis paths....

2022

UserNLP’22: 2022 International Workshop on User-centered Natural Language Processing

X Huang, L Flek, F Dernoncourt, C Welch, S Amir, R Sawhney, D Yang Companion Proceedings of the Web Conference 2022, 1176-1177, 2022

We report goals, paper submissions, keynotes, and organizations of this UserNLP workshop. User-centered NLP can fill these gaps by explicitly considering stylistic variations across individuals or groups of individuals and focusing on user-level modeling tasks. While traditional NLP tasks tend to focus on single documents (e.g., sentiment analysis), user-centered NLP aims to make inferences for...

2022

The Impact of Differential Privacy on Group Disparity Mitigation

V Petrén Bach Hansen, A Tejaswi Neerkaje, R Sawhney, L Flek, ... arXiv e-prints, arXiv: 2203.02745, 2022

The performance cost of differential privacy has, for some applications, been shown to be higher for minority groups; fairness, conversely, has been shown to disproportionally compromise the privacy of members of such groups. Most work in this area has been restricted to computer vision and risk assessment. In this paper, we evaluate the impact of...

2022

5.3 Developing Benchmark Datasets for Frame Identification

K Al-Khatib, K Budzynska, A Bondarenko, L Flek, A Frank, I Gurevych, ... Report from Dagstuhl Seminar 22131: Framing in Communication: From Theories …, 2022

Framing has become recognised as a powerful communication strategy for winning debates and shaping opinions and decisions. Entman defines framing as an action of selecting “some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation,...

2022

DMix: Distance Constrained Interpolative Mixup

R Sawhney, M Thakkar, S Pandit, D Mukherjee, L Flek Proceedings of the 1st Workshop on Multilingual Representation Learning, 242-244, 2021

Interpolation-based regularisation methods have proven to be effective for various tasks and modalities. Mixup is a data augmentation method that generates virtual training samples from convex combinations of individual inputs and labels. We extend Mixup and propose DMix, distance-constrained interpolative Mixup for sentence classification leveraging the hyperbolic space. DMix achieves state-of-the-art results on sentence classification...

2021

HYPMIX: Hyperbolic Interpolative Data Augmentation

R Sawhney, M Thakkar, S Agarwal, D Jin, D Yang, L Flek Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021

Interpolation-based regularisation methods for data augmentation have proven to be effective for various tasks and modalities. These methods involve performing mathematical operations over the raw input samples or their latent states representations-vectors that often possess complex hierarchical geometries. However, these operations are performed in the Euclidean space, simplifying these representations, which may lead to distorted...

2021

Modeling Proficiency with Implicit User Representations

K Breitwieser, A Lahnala, C Welch, L Flek, M Potthast arXiv preprint arXiv:2110.08011, 2021

We introduce the problem of proficiency modeling: Given a user's posts on a social media platform, the task is to identify the subset of posts or topics for which the user has some level of proficiency. This enables the filtering and ranking of social media posts on a given topic as per user proficiency. Unlike...

2021

Perceived and Intended Sarcasm Detection with Graph Attention Networks

J Plepi, L Flek Findings of the Association for Computational Linguistics: EMNLP 2021. 2021., 2021

Existing sarcasm detection systems focus on exploiting linguistic markers, context, or user-level priors. However, social studies suggest that the relationship between the author and the audience can be equally relevant for the sarcasm usage and interpretation. In this work, we propose a framework jointly leveraging (1) a user context from their historical tweets together with...

2021

Towards User-Centric Text-to-Text Generation: A Survey

D Yang, L Flek International Conference on Text, Speech, and Dialogue, 3-22, 2021

Natural Language Generation (NLG) has received much attention with rapidly developing models and ever-more available data. As a result, a growing amount of work attempts to personalize these systems for better human interaction experience. Still, diverse sets of research across multiple dimensions and numerous levels of depth exist and are scattered across various communities. In...

2021

Suicide Ideation Detection via Social and Temporal User Representations using Hyperbolic Learning

R Sawhney, H Joshi, R Shah, L Flek Proceedings of the 2021 Conference of the North American Chapter of the …, 2021

Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Personally contextualizing the buildup of such ideation is critical for accurate identification of users at risk. In this work, we propose a framework jointly leveraging a user’s emotional history and social information from a user’s neighborhood...

2021

PHASE: Learning Emotional Phase-aware Representations for Suicide Ideation Detection on Social Media

R Sawhney, H Joshi, L Flek, R Shah Proceedings of the 16th Conference of the European Chapter of the …, 2021

Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Contextualizing the build-up of such ideation is critical for the identification of users at risk. In this work, we focus on identifying suicidal intent in tweets by augmenting linguistic models with emotional phases modeled from users’...

2021

Returning the N to NLP: Towards Contextually Personalized Classification Models

L Flek Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020

Most NLP models today treat language as universal, even though socio-and psycholingustic research shows that the communicated message is influenced by the characteristics of the speaker as well as the target audience. This paper surveys the landscape of personalization in natural language processing and related fields, and offers a path forward to mitigate the decades...

2020

Common Conversational Community Prototype: Scholarly Conversational Assistant

K Balog, L Flekova, M Hagen, R Jones, M Potthast, F Radlinski, ... arXiv preprint arXiv:2001.06910, 2020

This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions. Specifically, we propose to develop and operate a prototype conversational search system for scholarly activities. This Scholarly Conversational Assistant would serve as a useful tool, a...

2020

The impact of actively open-minded thinking on social media communication

J Carpenter, D Preotiuc-Pietro, J Clark, L Flekova, L Smith, ML Kern, ... Judgment and Decision Making 13 (6), 562, 2018

Online, social media communication is often ambiguous, and it can encourage speed and inattentiveness. We investigated whether Actively Open Minded Thinking (AOT), a dispositional willingness to seek out new or potentially threatening information, may help users avoid these pitfalls. In Study 1, we determined that correctly assessing social media authors’ traits was positively predicted by...

2018

Changes in psycholinguistic attributes of social media users before, during, and after self-reported influenza symptoms

L Flekova, V Lampos, IJ Cox Proceedings of the 3rd Social Media Mining for Health Applications (SMM4H …, 2018

Previous research has linked psychological and social variables to physical health. At the same time, psychological and social variables have been successfully predicted from the language used by individuals in social media. In this paper, we conduct an initial exploratory study linking these two areas. Using the social media platform of Twitter, we identify users...

2018

Proceedings of the Second Workshop on Stylistic Variation

J Brooke, L Flekova, M Koppel, T Solorio Proceedings of the Second Workshop on Stylistic Variation, 2018

The 2nd Workshop on Stylistic Variation (StyVa) at NAACL 2018 follows up the successful first iteration of the workshop at EMNLP 2017. The goal of the workshop is to offer a venue for bringing together a large but previously underserved and splintered community within computational linguistics, attracting a variety of perspectives on style from traditional...

2018

Content-based Analysis and Visualization of Story Complexity

L Flekova, F Stoffel, I Gurevych, D Keim

Obtaining insights into the style and content characteristics of a novel can provide a benefit to a large number of users. Parents and teachers may be interested in finding appropriate books for children. Booksellers may want to assess the fit of a candidate’s artwork into their portfolio or determine the target audience for their promotion...

2018

Lexical-semantic resources: yet powerful resources for automatic personality classification

XS Vu, L Flekova, L Jiang, I Gurevych Proceedings of the 9th global WORDNET conference, 172-181, 2018

In this paper, we aim to reveal the impact of lexical-semantic resources, used in particular for word sense disambiguation and sense-level semantic categorization, on automatic personality classification task. While stylistic features (eg, part-of-speech counts) have been shown their power in this task, the impact of semantics beyond targeted word lists is relatively unexplored. We propose...

2018

Reconstruction of Micropattern Detector Signals using Convolutional Neural Networks

L Flekova, M Schott Journal of Physics: Conference Series 898 (3), 032054, 2017

Micropattern gaseous detector (MPGD) technologies, such as GEMs or MicroMegas, are particularly suitable for precision tracking and triggering in high rate environments. Given their relatively low production costs, MPGDs are an exemplary candidate for the next generation of particle detectors. Having acknowledged these advantages, both the ATLAS and CMS collaborations at the LHC are exploiting...

2017

Real Men Don’t Say “Cute” Using Automatic Language Analysis to Isolate Inaccurate Aspects of Stereotypes

J Carpenter, D Preotiuc-Pietro, L Flekova, S Giorgi, C Hagan, ML Kern, ... Social Psychological and Personality Science 8 (3), 310-322, 2017

People associate certain behaviors with certain social groups. These stereotypical beliefs consist of both accurate and inaccurate associations. Using large-scale, data-driven methods with social media as a context, we isolate stereotypes by using verbal expression. Across four social categories—gender, age, education level, and political orientation—we identify words and phrases that lead people to incorrectly guess...

2017

Leveraging Lexical-Semantic Knowledge for Text Classification Tasks

L Flekova Technische Universität Darmstadt, 2017

This dissertation is concerned with the applicability of knowledge, contained in lexicalsemantic resources, to text classification tasks. Lexical-semantic resources aim at systematically encoding various types of information about the meaning of words and their relations. Text classification is the task of sorting a set of documents into categories from a predefined set, for example,“spam” and...

2017

A User Interface for the Exploration of Manually and Automatically Coded Scientific Reasoning and Argumentation

P Lerner, J Daxenberger, L Flekova, I Gurevych, A Csanadi, C Ghanem, ... 12th International Conference of the Learning Sciences, Singapore, 2016

Scientific reasoning and argumentation (SRA) is a complex process. Thus, analyzing the quality of learners’ SRA and presenting SRA outcomes are important research problems. This study attempts to account for these problems by developing a user interface that facilitates learning scientists to analyze SRA, enabling them to evaluate the performance of an automated coding algorithm...

2016

Supersense Embeddings: A Unified Model for Supersense Interpretation, Prediction, and Utilization

L Flekova, I Gurevych Proceedings of the 54th Annual Meeting of the Association for Computational …, 2016

Coarse-grained semantic categories such as supersenses have proven useful for a range of downstream tasks such as question answering or machine translation. To date, no effort has been put into integrating the supersenses into distributional word representations. We present a novel joint embedding model of words and supersenses, providing insights into the relationship between words...

2016

Exploring Stylistic Variation with Age and Income on Twitter

L Flekova, L Ungar, D Preotiuc-Pietro Proceedings of the 54th Annual Meeting of the Association for Computational …, 2016

Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing...

2016

Analyzing Biases in Human Perception of User Age and Gender from Text

L Flekova, J Carpenter, S Giorgi, L Ungar, D Preotiuc-Pietro Proceedings of the 54th Annual Meeting of the Association for Computational …, 2016

User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze...

2016

New exclusion limits on scalar and pseudoscalar axionlike particles from light shining through a wall

R Ballou, G Deferne, M Finger Jr, M Finger, L Flekova, J Hosek, S Kunc, ... Physical Review D 92 (9), 092002, 2015

Physics beyond the Standard Model predicts the possible existence of new particles that can be searched at the low-energy frontier in the sub-eV range. The OSQAR photon regeneration experiment looks for “light shining through a wall” from the quantum oscillation of optical photons into “weakly interacting sub-eV particles,” such as axion or axionlike particles (ALPs)...

2015

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words.

L Flekova, D Preotiuc-Pietro, E Ruppert WASSA@ EMNLP, 77-84, 2015

Contemporary sentiment analysis approaches rely heavily on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity....

2015

Personality Profiling of Fictional Characters using Sense-Level Links between Lexical Resources

L Flekova, I Gurevych Proceedings of the 2015 Conference on Empirical Methods in Natural Language …, 2015

This study focuses on personality prediction of protagonists in novels based on the Five-Factor Model of personality. We present and publish a novel collaboratively built dataset of fictional character personality and design our task as a text classification problem. We incorporate a range of semantic features, including WordNet and VerbNet sense-level information and word vector...

2015

Constructive Feedback, Thinking Process and Cooperation: Assessing the Quality of Classroom Interaction

T Sousa, L Flekova, M Mieskes, I Gurevych Interspeech 2015 – Towards a better understanding of the most important …, 2015

Analyzing and assessing the quality of classroom lessons on a range of quality dimensions is a number one educational research topic, as this allows developing teacher trainings and interventions to improve lesson quality. We model this assessment as a text classification task, exploiting linguistic features to predict the scores in several lesson quality dimensions relevant...

2015

CHASE Proposal

P Pugnat, R Ballou, G Deferne, L Duvillaret, M Finger Jr, M Finger, ...

For 2015, the OSQAR collaboration will focus on a new proposal for the search of chameleon, a hypothetical scalar particle postulated as a dark energy candidate with an environment-dependant mass. The required experimental set-up has been successfully tested and validated in 2014 at the SM-18 experimental hall. This proposal will focus on the sensitivity that...

2015

Inverted Polarity Bigram Lexicons

L Flekova, E Ruppert, D Preotiuc-Pietro

Sentiment prediction from Twitter is of the utmost interest for research and commercial organizations. Systems are usually using lexicons, where each word is positive or negative. However, word lexicons suffer from ambiguities at a contextual level: the word dark is positive in dark chocolate and negative in dark soul, the word lost is positive with...

2015

Document-level school lesson quality classification based on German transcripts

L Flekova, T Sousa, M Mieskes, I Gurevych Document-level school lesson quality classification based on German …, 2015

Analyzing large-bodies of audiovisual information with respect to discoursepragmatic categories is a time-consuming, manual activity, yet of growing importance in a wide variety of domains. Given the transcription of the audiovisual recordings, we propose to model the task of assigning discoursepragmatic categories as supervised machine learning task. By analyzing the effects of a wide variety...

2015

Feature-Based Visual Exploration of Text Classification

F Stoffel, L Flekova, D Oelke, I Gurevych, DA Keim Symposium on Visualization in Data Science at IEEE VIS, 2015

There are many applications of text classification such as gender attribution in market research or the identification of forged product reviews on e-commerce sites. Although several automatic methods provide satisfying performance in most application cases, we see a gap in supporting the analyst to understand the results and derive knowledge for future application scenarios. In...

2015

Analyzing crowdsourced assessment of user traits through Twitter posts

L Flekova, S Giorgi, J Carpenter, L Ungar, D Preotiuc-Pietro Third AAAI Conference on Human Computation and Crowdsourcing, HCOMP, 2015

Social media allows any user to express themselves to the public through posting content. Using a crowdsourcing experiment, we aim to quantify and analyze which human attributes lead to better perceptions of the true identity of others. Using tweet content from a set of users with known age and gender information, we ask workers to...

2015

Latest Results of the OSQAR Photon Regeneration Experiment for Axion-Like Particle Search

R Ballou, G Deferne, L Duvillaret, M Finger Jr, M Finger, L Flekova, ... arXiv preprint arXiv:1410.2566, 2014

The OSQAR photon regeneration experiment searches for pseudoscalar and scalar axion-like particles by the method of "Light Shining Through a Wall", based on the assumption that these weakly interacting sub-eV particles couple to two photons to give rise to quantum oscillations with optical photons in strong magnetic field. No excess of events has been observed,...

2014

Search for weakly interacting sub-eV particles with the OSQAR laser-based experiment: results and perspectives

P Pugnat, R Ballou, M Schott, T Husek, M Sulc, G Deferne, L Duvillaret, ... The European Physical Journal C 74 (8), 3027, 2014

Recent theoretical and experimental studies highlight the possibility of new fundamental particle physics beyond the Standard Model that can be probed by sub-eV energy experiments. The OSQAR photon regeneration experiment looks for “Light Shining through a Wall” from the quantum oscillation of optical photons into “Weakly Interacting Sub-eV Particles”, like axion or axion-like particles (ALPs), in...

2014

UKPDIPF: A Lexical Semantic Approach to Sentiment Polarity Prediction in Twitter Data

L Flekova, O Ferschke, I Gurevych Preslav Nakov and Torsten Zesch: Semeval-2014 Task 9: Sentiment Analysis in …, 2014

We present a sentiment classification system that participated in the SemEval 2014 shared task on sentiment analysis in Twitter. Our system expands tokens in a tweet with semantically similar expressions using a large novel distributional thesaurus and calculates the semantic relatedness of the expanded tweets to word lists representing positive and negative sentiment. This approach...

2014

What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data

L Flekova, O Ferschke, I Gurevych Proceedings of the 23rd international conference on World wide web, 855-866, 2014

With more than 22 million articles, the largest collaborative knowledge resource never sleeps, experiencing several article edits every second. Over one fifth of these articles describes individual people, the majority of which are still alive. Such articles are, by their nature, prone to corruption and vandalism. Manual quality assurance by experts can barely cope with...

2014

Wikipedia Article Feedback

L Flekova, O Ferschke, I Gurevych

The corpus lists article IDs of biographies of living and dead people, rated as above average or below average along four categories (trustowrthy, objective, well written, complete) based on the ratings from Wikipedia Article Feedback v4 [http://en. wikipedia. org/wiki/Wikipedia: Article_Feedback_Tool](each of the listed articles rated at least 10 times).

2014

Axion search by laser-based experiment OSQAR

M Sulc, P Pugnat, R Ballou, G Deferne, L Duvillaret, L Flekova, ... Nuclear Instruments and Methods in Physics Research Section A: Accelerators …, 2013

Laser-based experiment OSQAR in CERN is aimed to the search of the axions by two methods. The photon regeneration experiment is using two LHC dipole magnets of the length 14.3m and magnetic field 9.5T equipped with an optical barrier at the end of the first magnet. It looks as light shining through the wall. No...

2013

Results and Perspectives for Laboratory Search of Weakly Interacting Sub-eV Particles with the OSQAR Experiment

P Pugnat, R Ballou, M Schott, T Husek, M Sulc, G Deferne, L Duvillaret, ... arXiv preprint arXiv:1306.0443, 2013

Recent intensive theoretical and experimental studies highlight the possibility of new fundamental particle physics beyond the standard model that can be probed by sub-eV energy experiments. The OSQAR photon regeneration experiment looks for Light Shining through a Wall (LSW) from the quantum oscillation of optical photons into Weakly Interacting Sub-eV Particles (WISPs), like axion or...

2013

Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media

L Flekova, I Gurevych CLEF 2013 Labs and Workshops, 2013

Would you target your audience differently, knowing the real age and gender of the text authors on your website forum? This paper examines hundreds of thousands of online documents, eg chat lines or blog posts, showing that computers are capable to address this task better than humans, without relying on content stereotypes. Pointing out that...

2013

Results of the 2nd run of OSQAR Photon Regeneration Experiment

M Schott, M Finger, KA Meissner, R Ballou, T Husek, K Macuchova, ...

Recent intensive theoretical and experimental studies shed light on possible new physics beyond the standard model of particle physics, which can be probed with sub-eV energy experiments. In the second run of the OSQAR photon regeneration experiment, which looks for the conversion of photon to axion (or Axion-Like Particle), two spare superconducting dipole magnets of...

2011

First Results of the Full-Scale OSQAR Photon Regeneration Experiment

M Schott, P Pugnat, R Ballou, L Duvillaret, T Husek, R Jost, L Flekova, ... arXiv preprint arXiv:1110.0774, 2011

Recent intensive theoretical and experimental studies shed light on possible new physics beyond the standard model of particle physics, which can be probed with sub-eV energy experiments. In the second run of the OSQAR photon regeneration experiment, which looks for the conversion of photon to axion (or Axion-Like Particle), two spare superconducting dipole magnets of...

2011

Effective Domain Adaptation of Instruction-Tuned LLMs for Knowledge-Intensive Tasks

T Zhang, F Mai, L Flek

Continual pretraining promises to adapt large language models (LLMs) to new test domains using only unlabeled data, but naively applying common self-supervised objectives is known to degrade instruction-following performance. Existing fixes assume access to the original base model-a realistic barrier in settings where the base model weights are withheld for safety reasons. In this work,...

Probing the Robustness of Theory of Mind in Large Language Models

L Schrewe, C Nickel, L Flek Eighth Widening NLP Workshop (WiNLP 2024) Phase II, 0

Theory of Mind (ToM) is considered essential in understanding the intentions and beliefs of others. Recent advancements in large language models (LLMs) like ChatGPT have sparked claims that these models exhibit ToM capabilities. However, follow-up studies reveal that these capabilities vanish with slight task variations. This paper introduces a novel dataset comprising 68 tasks across...

Understanding Implicit Hate Speech Detection

L Flek, J Plepi

Implicit hate speech is defined by coded or indirect language that disparages a person or group on the basis of protected characteristics like race, gender, and cultural identity. Compared to explicit hate speech detection, implicit hate speech contains several challenges for the NLP models. One key challenge, is that implicit hate speech detection does not...

Automated Template Paraphrasing for Conversational Assistants

L Vogel, L Flek

With synthetic data generation, the required amount of human-generated training data can be reduced significantly. In this work, we explore the usage of automatic paraphrasing models such as GPT-2 and CVAE to augment template phrases for task-oriented dialogue systems while preserving the slots. Additionally, we systematically analyze how far manually annotated training data can be...

Personalized Models for Fake News Detection

L Flek, C Welch

Detecting fake news online is an important and timely issue. We are interested in investigating personal differences in how fake news is spread and developing models to detect fake news. Our group has collected data for this task and for thousands of Reddit users. This can be used to test models for personalization from embeddings,...

Controllable Generation Using kNN Language Models

L Flek, C Welch

Language modeling forms the foundation of many language processing problems and involves predicting which words come next in a sequence. Recent work on kNN language models has shown improved perplexity by storing encodings of sentence contexts and retrieving similar contexts to alter the probability distribution when predicting the next token (Khandelwal et al. 2020). If...

Investigating the level of stubbornness regarding sociopolitical views in social media

L Flek, F Sakketou

While social media platforms help to connect people worldwide and give access to enormous amounts of diverse information, they also foster an environment that promotes polarization. This occurs due to the fact that users show a tendency to consume content that aligns with their political leaning and join groups adhering to their beliefs. This phenomenon...

Proposal and results of COMPASS database upgrade

L Fleková, V Jarý, T Liška, M Virius

Experiments in the field of particle physics produce vast amounts of data which need to be stored and processed. This means that reliable and efficient data acquisition system (DAQ) is an integral part of these experiments. COMPASS is a fixed target experiment operating on the Super Proton Synchrotron particle accelerator in CERN (European Organization for...

2026

Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards. M Jørgenvåg, D Kaczér, L Ruttert, M Gülhan, L Flek, F Mai. arXiv preprint arXiv:2605.31328.
Read more

Transfer Learning Across Fast-and Full-Simulation Domains in High-Energy Physics. M Schott, L Flek arXiv preprint arXiv:2605.07471.
Read more

Learning Minimal-Deviation Corrections for Multi-Dimensional Mismodelling in HEP Simulations. M Schott, L Flek arXiv preprint arXiv preprint arXiv:2605.07460.
Read more

Uncovering Hidden Systematics in Neural Network Models for High Energy Physics. L Flek, PA Jungs, A Karimi, T Saala, A Schmid, M Schott, P Soldin, C Wiebusch, U Willemsen. arXiv preprint arXiv:2605.07470.
Read more

Plausible but Wrong: A case study on Agentic Failures in Astrophysical Workflows S Rawat, L Flek. arXiv preprint arXiv:2604.25345.
Read more

Reasoning Primitives in Hybrid and Non-Hybrid LLMs S Rawat, L Flek, F Mai, NK Corrêa. arXiv preprint arXiv:2604.21454.
Read more

(Re-) Thinking Empathy’s Materiality in HCI S Ppali, M Yurrita, A Vitali, A Debnath, L Flek, A Cuadra, S Mayer, M Lahav, T Horne, A Singh, G Barbareschi, A Mauri, H Verma
Read more

Can LLM Agents Identify Spoken Dialects like a Linguist? T Bystrich, L Hamm, M Hassan, L Fischbach, L Flek, A Karimi. arXiv preprint arXiv:2603.29541.
Read more

Conspiracy Frame: a Semiotically-Driven Approach for Conspiracy Theories Detection HC Piva, S Ashraf, MK Jouneghani, A Longo, R Damiano, L Flek, MA Stranisci. arXiv preprint arXiv:2603.21368.
Read more

CHARISMA: Character-Based Interaction Simulation with Multi-LLM Agents Toward Computational Social Psychology  V Sadiri Javadi, F Róg, A Aksa, J Trippas, S Vakulenko, L Flek 
Read more

Shapes are not enough: CONSERVAttack and its use for finding vulnerabilities and uncertainties in machine learning applications P Bechtle, L Flek, PA Jung, A Karimi, T Saala, A Schmidt, M Schott, P Soldin, C Wiebusch, U Willemsen. arXiv preprint arXiv:2603.13970
Read more

Tucano 2 Cool: Better Open Source LLMs for Portuguese NK Corrêa, A Sen, S Fatimah, S Falk, L Landgraf, J  Kastner, L Flek. arXiv preprint arXiv:2603.03543.
Read more

Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi S Fatimah, A Sen, S Falk, F Mai, L Flek, NK Corrêa. arXiv preprint arXiv:2603.03508.
Read more

Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents MHA Monfared, L Flek, A Karimi.
Read more

Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models C Nickel, L Schrewe, F Mai, L Flek.
Read more

PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments. S Nie, K Omoomi, L Flek, Z Zhao, C Welch. arXiv preprint. arXiv:2602.08716.
Read more

On the Limitations of Language-targeted Pruning: Investigating the Calibration Language Impact in Multilingual LLM Pruning. S Kurz, JJ Chen, L Flek, Z Zhao. Transactions of the Association for Computational Linguistics 14, 167-192.
Read more

Pluralistic AI Alignment: A Cross-Cultural Pilot Survey. K Alavi, L Flek, F Mai. Second Workshop on Language Models for Underserved Communities (LM4UC).
Read more

2025

Encoder Fine-tuning with Stochastic Sampling Outperforms Open-weight GPT in Astronomy Knowledge Extraction. S Rawat, L Flek, A Karimi. 
Read more

TARGAMA: A Novel Benchmark Dataset and Framework for Translating Dialectal Arabic to English with Generative Language Models. B Abdou, H Elsafty, F Aldabbas, M Pielka, R Sifa, L Flek. 
Read more

More Agents Helps but Adversarial Robustness Gap Persists. K Alavi, Z Yeltay, L Flek, A Karimi. arXiv preprint arXiv:2511.07112.
Read more

MiniFool-Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks. L Flek, O Janik, PA Jung, A Karimi, Timo Saala, A Schmidt, M Schott, P Soldin, M Thiesmeyer, C Wiebusch, U Willemsen. arXiv preprint arXiv:2511.01352.
Read more

The Practical Impacts of Theoretical Constructs on Empathy Modeling. A Lahnala, C Welch, D Jurgens, L Flek
Read more

CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration. VS Javadi, ZU Abedin, L Flek
Read more

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation. T Zhang, F Mai, L Flek. arXiv preprint arXiv:2510.20377.
Read more

Proceedings of the 18th International Natural Language Generation Conference: System Demonstrations. L Flek, S Narayan, J Pei.
Read more

Disparities in Multilingual LLM-Based Healthcare Q&A. IB Schlicht, B Sayin, Z Zhao, FM Labonté, C Barbera, M Viviani, P Rosso, L Flek. arXiv e-prints, arXiv: 2510.17476.
Read more

Colliding with Adversaries: A Challenge on Robust Learning in High Energy Physics at ECML PKDD 2025. T Saala, L Flek, A Karimi, PA Jung, A Schmidt, P Soldin, D Stefanopoulos, A Voskou, U Willemsen, C Wiebusch, M Schott.
Read more

Funzac at CoMeDi Shared Task: Modeling annotator disagreement from word-in-context perspectives. Olufunke O Sarumi, Charles Welch, Lucie Flek, Jörg Schlötterer. arXiv preprint arXiv:2501.14617.
Read more

ISCA: A Framework for Interview-Style Conversational Agents. C Welch, A Lahnala, V Varadarajan, L Flek, R Mihalcea, JL Boyd, J Sedoc. arXiv preprint arXiv:2508.14344.
Read more

Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions. S Nie, F Mai, D KaczĂŠr, C Welch, Z Zhao, L Flek. arXiv preprint arXiv:2508.11414.
Read more

In-Training Defenses against Emergent Misalignment in Language Models. D Kaczér, M Jørgenvåg, C Vetter, L Flek, F Mai. arXiv preprint arXiv:2508.06249.
Read more

Improving Low-Resource Dialect Classification Using Retrieval-based Voice Conversion. L Fischbach, A Karimi, C Kleen, A Lameli, L Flek. arXiv preprint arXiv:2507.03641.
Read more

Multi-Hop Reasoning for Question Answering with Hyperbolic Representations. S Welz, L Flek, A Karimi. arXiv preprint arXiv:2507.03612.
Read more

CAISA at SemEval-2025 Task 7: Multilingual and Cross-lingual Fact-Checked Claim Retrieval. M Haroon, S Ashraf, I Baris, L Flek. Proceedings of the 19th International Workshop on Semantic Evaluation.
Read more

Explainable Hallucination through Natural Language Inference Mapping. WF Chen, Z Zhao, A Karimi, L Flek. Findings of the Association for Computational Linguistics: ACL 2025, 1888-1896.
Read more

Detection of Medical Conspiracy Theories with Limited Resources: Using Data from Prior Epidemics and LLMs. IB Schlicht, D Korenčić, B Chulvi, L Flek, P Rosso.
Read more

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models. Mehdi Ali, Manuel Brack, Max Lübbering, Elias Wendt, Abbas Goher Khan, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Patrick Schramowski, Michael Fromm, Kristian Kersting.
Read more

Does Preprocessing Matter? An Analysis of Acoustic Feature Importance in Deep Learning for Dialect Classification. L Fischbach, C Kleen, L Flek, A Lameli. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics…
Read more.

Superalignment with Dynamic Human Values. F Mai, D Kaczér, NK Corrêa, L Flek. arXiv preprint arXiv:2503.13621.
Read more

The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs. A Lahnala, C Welch, D Jurgens, L Flek. arXiv preprint arXiv:2501.14981 (2025).
Read more

Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? IB Schlicht, Z Zhao, B Sayin, L Flek, P Rosso. arXiv preprint arXiv:2501.14719 (2025).
Read more

ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving. Abedin, Zain Ul, et al.. arXiv preprint arXiv:2501.08203 (2025).
Read more

Enforcing Fundamental Relations via Adversarial Attacks on Input Parameter Correlations. T Saala, L Flek, A Jung, A Karimi, A Schmidt, M Schott, P Soldin, …arXiv preprint arXiv:2501.05588 (2025).
Read more

Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing. Arora, Pulkit, Akbar Karimi, and Lucie Flek. arXiv preprint. arXiv:2501.08276 (2025). Read more

Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization. A Lahnala, V Varadarajan, L Flek, HA Schwartz, RL Boyd. arXiv preprint. arXiv:2501.04820 (2025).
Read more

Exploring Robustness of Multilingual LLMs on Real-World Noisy Data. Aliakbarzadeh, Amirhossein, Lucie Flek, and Akbar Karimi. arXiv preprint. arXiv:2501.08322 (2025).
Read more

MultiProp Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning. F Aldabbas, S Ashraf, R Sifa, L Flek. Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script, 7-22.
Read more

A Comparison of Data Augmentation Techniques for Text Classification. Peyman Hassani Jalilian, Akbar Karimi.

2024

Explaining GPT-4’s Schema of Depression Using Machine Behavior Analysis. AV Ganesan, V Varadarajan, YK Lal, VC Eijsbroek, K Kjell, ONE Kjell, … arXiv preprint. arXiv:2411.13800 (2024).
Read more

How large language models can reshape collective intelligence. Burton, Jason W and Lopez-Lopez, Ezequiel and Hechtlinger, Shahar and Rahwan, Zoe and Aeschbach, Samuel and Bakker, Michiel A and Becker, Joshua A and Berditchevskaia, Aleks and Berger, Julian and Brinkmann, Levin and others. Nature Human Behaviour 2024.
Read more

Probing the Robustness of Theory of Mind in Large Language Models. C Nickel, L Schrewe, L Flek. arXiv preprint. arXiv:2410.06271 (2024).
Read more

Do Multilingual Large Language Models Mitigate Stereotype Bias? Nie, Shangrui and Fromm, Michael and Welch, Charles and Görge, Rebekka and Karimi, Akbar and Plepi, Joan and Mowmita, Nazia Afsan and Flores-Herr, Nicolas and Ali, Mehdi and Flek, Lucie. C3NLP 2024

Perspective Taking through Generating Responses to Conflict Situations. Plepi, Joan and Welch, Charles and Flek, Lucie. ACL Findings 2024.

Unveiling Information Through Narrative In Conversational Information Seeking. Sadiri Javadi, Vahid and Trippas, Johanne R and Flek, Lucie. CUI 2024.
Read more

Pitfalls of Conversational LLMs on News Debiasing. Schlicht, Ipek Baris and Altiok, Defne and Taouk, Maryanne and Flek, Lucie. DELITE 2024.

A Perspectivist Corpus of Numbers in Social Judgements. May, Marlon and Flek, Lucie and Welch, Charles. NLPerspectives LREC-COLING 2024.

Corpus Considerations for Annotator Modeling and Scaling. Olufunke O. Sarumi and Béla Neuendorf and Joan Plepi and Lucie Flek and Jörg Schlötterer and Charles Welch. NAACL 2024.
Read more.

Can Stories Help LLMs Reason? Curating Information Space Through Narrative. V Sadiri Javadi, JR Trippas, YK Lal, L Flek. arXiv e-prints, arXiv: 2410.19221 (2024).
Read more

Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text Generation. S Balloccu, Z Kasner, O Plátek, P Schmidtová, K Onderková, M Lango, … Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text …
Read more

Language-specific Calibration for Pruning Multilingual Language Models. S Kurz, Z Zhao, JJ Chen, L Flek. arXiv preprint. arXiv:2408.14398.
Read more

Large Language Models are Human-like Annotators. Marreddy, Mounika and Oota, Subba Reddy and Gupta, Manish and Flek, Lucie. KR 2024.

Harnessing Personalization Methods to Identify and Predict Unreliable Information Spreader Behavior. Ashraf, Shaina and Gruschka, Fabio and Flek, Lucie and Welch, Charles. WOAH 2024.

EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization. O Sotolar, V Formanek, A Debnath, A Lahnala, C Welch, L Flek. arXiv preprint. arXiv:2406.19071 (2024).
Read more

Archetypes and Entropy: Theory-Driven Extraction of Evidence for Suicide Risk. Varadarajan, Vasudha and Lahnala, Allison and Ganesan, Adithya V. and Dey, Gourab and Mangalik, Siddharth and Bucur, Ana-Maria and Soni, Nikita and Rao, Rajath and Lanning, Kevin and Vallejo, Isabella and Flek, Lucie and Schwartz, H. Andrew and Welch, Charles and Boyd, Ryan L..CLPsych 2024.

Appraisal Framework for Clinical Empathy: A Novel Application to Breaking Bad News Conversations. Lahnala, Allison and Neuendorf, Béla and Thomin, Alexander and Welch, Charles and Stibane, Tina and Flek, Lucie. LREC 2024.

DeFaktS: A Fine-Grained Dataset for Analyzing Disinformation in German Media. Ashraf, Shaina and Bezzaoui, Isabel and Andone, Ionut and Markowetz, Alexander and Fegert, Jonas and Flek, Lucie. LREC 2024.

Reference-guided Style-Consistent Content Transfer. Chen, Wei-Fan and Alshomary, Milad and Stahl, Maja and Al Khatib, Khalid and Stein, Benno and Wachsmuth, Henning. LREC 2024.

Vanishing Boundaries: A Unifying Account of Multidimensional Emotion Dynamics and Alterations in Depression. AM Bucur, TA Koosha, A Cosma, L Flek, SE Thanarajah, F Bernhard, …OSF.
Read more

Proceedings of the 1st Human-Centered Large Language Modeling Workshop. N Soni, L Flek, A Sharma, D Yang, S Hooker, HA Schwartz
Proceedings of the 1st Human-Centered Large Language Modeling Workshop.
Read more

LeadEmpathy: An Expert Annotated German Dataset of Empathy in Written Leadership Communication. Sedefoglu, Didem and Lahnala, Allison and Wagner, Jasmin and Flek, Lucie and Ohly, Sandra. LREC 2024.

2023

more
  • CAISA at SemEval-2023 Task 8: Counterfactual Data Augmentation for Mitigating Class Imbalance in Causal Claim Identification
    Karimi, Akbar and Flek, Lucie
    SemEval 2023
    Read more.
  • Vanishing Boundaries: A Unifying Account of Multidimensional Emotion Dynamics and Alterations in Depression
    Bucur, Ana-Maria and Koosha, Tahmineh A. and Cosma, Adrian and Flek, Lucie and Thanarajah, Sharmili Edwin and Bernhard, Felix and Rosso, Paolo and Jamalabadi, Hamidreza
    Read more.
  • Suicide Ideation Detection via Social and Temporal User Representations using Hyperbolic Learning
    Sawhney, Ramit and Joshi, Harshit and Shah, Rajiv Ratn and Flek, Lucie
    NAACL-HLT 2021
    Read more.
  • Towards User-Centric Text-to-Text Generation: A Survey
    Yang, Diyi and Flek, Lucie
    Read more.
  • Perceived and Intended Sarcasm Detection with Graph Attention Networks
    Plepi, Joan and Flek, Lucie
    Findings 2021
    Read more.
  • HypMix: Hyperbolic Interpolative Data Augmentation
    Sawhney, Ramit and Thakkar, Megh and Agarwal, Shivam and Jin, Di and Yang, Diyi and Flek, Lucie
    EMNLP 2021
    Read more.
  • The Impact of Differential Privacy on Group Disparity Mitigation
    Petren Bach Hansen, Victor and Tejaswi Neerkaje, Atula and Sawhney, Ramit and Flek, Lucie and Sogaard, Anders
    PrivateNLP 2022
    Read more.
  • Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion
    Sakketou, Flora and Lahnala, Allison and Vogel, Liane and Flek, Lucie
  • UserNLP’22: 2022 International Workshop on User-centered Natural Language Processing
    Huang, Xiaolei and Flek, Lucie and Dernoncourt, Franck and Welch, Charles and Amir, Silvio and Sawhney, Ramit and Yang, Diyi
    WWW ’22: The ACM Web Conference 2022
    Read more.
  • Refining Diagnosis Paths for Medical Diagnosis based on an Augmented Knowledge Graph
    Heilig, Niclas and Kirchhoff, Jan and Stumpe, Florian and Plepi, Joan and Flek, Lucie and Paulheim, Heiko
  • CAISA at WASSA 2022: Adapter-Tuning for Empathy Prediction
    Lahnala, Allison and Welch, Charles and Flek, Lucie
    WASSA 2022
    Read more.
  • DMix: Adaptive Distance-aware Interpolative Mixup
    Sawhney, Ramit and Thakkar, Megh and Pandit, Shrey and Soun, Ritesh and Jin, Di and Yang, Diyi and Flek, Lucie
    ACL 2022
    Read more.
  • FACTOID: A New Dataset for Identifying Misinformation Spreaders and Political Bias
    Sakketou, Flora and Plepi, Joan and Cervero, Riccardo and Geiss, Henri Jacques and Rosso, Paolo and Flek, Lucie
    Read more.
  • Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy
    Lahnala, Allison and Welch, Charles and Neuendorf, Béla and Flek, Lucie
    NAACL-HLT 2022
    Read more.
  • OK Boomer: Probing the socio-demographic Divide in Echo Chambers
    Geiss, Henri-Jacques and Sakketou, Flora and Flek, Lucie
    SocialNLP 2022
    Read more.
  • Towards Suicide Ideation Detection Through Online Conversational Context
    Sawhney, Ramit and Agarwal, Shivam and Neerkaje, Atula Tejaswi and Aletras, Nikolaos and Nakov, Preslav and Flek, Lucie
    SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    Read more.
  • Understanding Interpersonal Conflict Types and their Impact on Perception Classification
    Welch, Charles and Plepi, Joan and Neuendorf, Béla and Flek, Lucie
    NLP+CSS 2022
    Read more.
  • Investigating Paraphrasing-Based Data Augmentation for Task-Oriented Dialogue Systems
    Vogel, Liane and Flek, Lucie
    Read more.
  • CAISA@SMM4H’22: Robust Cross-Lingual Detection of Disease Mentions on Social Media with Adversarial Methods
    Karimi, Akbar and Flek, Lucie
    SMM4H 2022
    Read more.
  • Temporal Graph Analysis of Misinformation Spreaders in Social Media
    Plepi, Joan and Sakketou, Flora and Geiss, Henri-Jacques and Flek, Lucie
    TextGraphs 2022
    Read more.
  • Unifying Data Perspectivism and Personalization: An Application to Social Norms
    Plepi, Joan and Neuendorf, Béla and Flek, Lucie and Welch, Charles
    EMNLP 2022
    Read more.
  • Nearest Neighbor Language Models for Stylistic Controllable Generation
    Trotta, Severino and Flek, Lucie and Welch, Charles
    GEM 2022
    Read more.
  • A Critical Reflection and Forward Perspective on Empathy and Natural Language Processing
    Lahnala, Allison and Welch, Charles and Jurgens, David and Flek, Lucie
    Findings 2022
    Read more.
  • Multilingual Detection of Check-Worthy Claims Using World Languages and Adapter Fusion
    Schlicht, Ipek Baris and Flek, Lucie and Rosso, Paolo
    Read more.
  • How Much User Context Do We Need? Privacy by Design in Mental Health NLP Applications
    Sawhney, Ramit and Neerkaje, Atula and Habernal, Ivan and Flek, Lucie
    Read more.
  • Domain Transfer for Empathy, Distress, and Personality Prediction
    Gruschka, Fabio and Lahnala, Allison and Welch, Charles and Flek, Lucie
    WASSA 2023
    Read more.
  • OpinionConv: Conversational Product Search with Grounded Opinions
    Sadiri Javadi, Vahid and Potthast, Martin and Flek, Lucie
    SIGDIAL 2023
    Read more.
  • Challenges of GPT-3-Based Conversational Agents for Healthcare
    Lechner, Fabian and Lahnala, Allison and Welch, Charles and Flek, Lucie
    RANLP 2023
    Read more.
  • Personalized Intended and Perceived Sarcasm Detection on Twitter
    Plepi, Joan and Buski, Magdalena and Flek, Lucie
    cpss 2023
    Read more.
  • Style Locality for Controllable Generation with kNN Language Models
    Nawezi, Gilles and Flek, Lucie and Welch, Charles
    Read more.
  • Leveraging Similar Users for Personalized Language Modeling with Limited Data
    Welch, Charles and Gu, Chenxi and Kummerfeld, Jonathan K. and Perez-Rosas, Veronica and Mihalcea, Rada
    ACL 2022
    Read more.
  • Knowledge Enhanced Reflection Generation for Counseling Dialogues
    Shen, Siqi and Perez-Rosas, Veronica and Welch, Charles and Poria, Soujanya and Mihalcea, Rada
    ACL 2022
    Read more.