Research Group – Data Science & Language Technologies
Dynamically Social Natural Language Processing for Online Discourse Analysis (DynSoDA)
© Generative AI About the project:
The DynSoDA project will model the discourse aspects of language together with the deep representations of user characteristics and latent social network profiles derived from online dialogues. In contrast to current approaches, user representations will be treated as dynamically contextual. The project further envisions the use of transfer learning techniques at multiple levels of abstraction to work robustly across a range of NLP tasks related to social discourse (such as opinion detection, hate speech identification, or argument persuasiveness prediction).
Timing: 01.10.2020 – 31.12.2024
Funding: Förderung des Bundesministeriums für Bildung und Forschung von KI-Nachwuchswissenschaftlerinnen
Principal Investigators: Prof. Dr. Lucie Flek
Publications
– Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization, A Lahnala, V Varadarajan, L Flek, HA Schwartz, RL Boyd, arXiv preprint arXiv:2501.04820 (2025)
– The Muddy Waters of Modeling Empathy in Language: The Practical Impacts of Theoretical Constructs, A Lahnala, C Welch, D Jurgens, L Flek, arXiv preprint arXiv:2501.14981 (2025)
– Perspective Taking through Generating Responses to Conflict Situations, J Plepi, C Welch, L Flek, Findings of the Association for Computational Linguistics ACL 2024, 6482-6497
– Proceedings of the 1st Human-Centered Large Language Modeling Workshop, N Soni, L Flek, A Sharma, D Yang, S Hooker, HA Schwartz, Proceedings of the 1st Human-Centered Large Language Modeling Workshop
– Do Multilingual Large Language Models Mitigate Stereotype Bias? S Nie, M Fromm, C Welch, R Görge, A Karimi, J Plepi, NA Mowmita, …, arXiv preprint arXiv:2407.05740
– EmPO: Emotion Grounding for Empathetic Response Generation through Preference Optimization, O Sotolar, V Formanek, A Debnath, A Lahnala, C Welch, L FLek, arXiv preprint arXiv:2406.19071
– Explaining GPT-4’s Schema of Depression Using Machine Behavior Analysis, AV Ganesan, V Varadarajan, YK Lal, VC Eijsbroek, K Kjell, ONE Kjell, …, arXiv preprint arXiv:2411.13800 (2024)
– Harnessing Personalization Methods to Identify and Predict Unreliable Information Spreader Behavior, Ashraf, Shaina and Gruschka, Fabio and Flek, Lucie and Welch, Charles, WOAH 2024
– A Perspectivist Corpus of Numbers in Social Judgements, M May, L Flek, C Welch, Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP …
– Archetypes and Entropy: Theory-Driven Extraction of Evidence for Suicide Risk, V Varadarajan, A Lahnala, AV Ganesan, G Dey, S Mangalik, AM Bucur, …, Proceedings of the 9th Workshop on Computational Linguistics and Clinical …
– Corpus Considerations for Annotator Modeling and Scaling, OO Sarumi, B Neuendorf, J Plepi, L Flek, J Schlötterer, C Welch, arXiv preprint arXiv:2404.02340
– LeadEmpathy: An Expert Annotated German Dataset of Empathy in Written Leadership Communication, Sedefoglu, Didem and Lahnala, Allison and Wagner, Jasmin and Flek, Lucie and Ohly, Sandra, LREC 2024
– Appraisal Framework for Clinical Empathy: A Novel Application to Breaking Bad News Conversations, Lahnala, Allison and Neuendorf, Béla and Thomin, Alexander and Welch, Charles and Stibane, Tina and Flek, Lucie, LREC 2024
– Vanishing Boundaries: A Unifying Account of Multidimensional Emotion Dynamics and Alterations in Depression, AM Bucur, TA Koosha, A Cosma, L Flek, SE Thanarajah, F Bernhard, …, OSF
– Personalized Intended and Perceived Sarcasm Detection on Twitter, Plepi, J., Buski, M., Flek, L., NLPCSS 2023
– Domain Transfer for Empathy, Distress, and Personality Prediction, Gruschka, F., Lahnala, A., Welch, C., Flek, L., WASSA 2023
– Style Locality for Controllable Generation with kNN Language Models, Nawezi, G., Flek, L., Welch, C., GEM 2023
– OpinionConv: Conversational Product Search with Grounded Opinions, Sadiri Javadi, V., Potthast, M., Flek, L., SIGDIAL 2023
– Challenges of GPT-3-Based Conversational Agents for Healthcare, Lechner, F., Lahnala, A., Welch, C., Flek, L., RANLP 2023
– OK Boomer: Probing the Socio-Demographic Divide in Echo Chambers, Geiss, H. J., Sakketou, F., Flek, L., NLP for Social Media 2022
– CAISA at SocialDisNER: Robust Cross-Lingual Detection of Disease Mentions on Social Media, Karimi, A., Flek, L., SMM4H @ CoLing 2022
– The Impact of Differential Privacy on Group Disparity Mitigation, Hansen, V. P. B., Neerkaje, A. T., Sawhney, R., Flek, L., Sogaard, A., PrivacyNLP 2022
– Refining Diagnosis Paths for Medical Diagnosis Based on an Augmented Knowledge Graph, Heilig, N. et al., SeWeBMeDA @ ESWC 2022
– UserNLP’22: Workshop on User-centered NLP, Huang, X. et al., WWW Conference 2022
– Paraphrasing-Based Data Augmentation for Dialogue Systems, Vogel, L., Flek, L., TSD 2022
– Investigating User Radicalization: A Novel Dataset for Identifying Fine-Grained Temporal Shifts in Opinion, Sakketou, F., Lahnala, A., Vogel, L., Flek, L., LREC 2022
– A New Dataset for Identifying Misinformation Spreaders and Political Bias, Sakketou, F., Plepi, J., Cervero, R., Rosso, P., Flek, L., LREC 2022
– Unifying Data Perspectivism and Personalization: An Application to Social Norms, Plepi, J., Neuendorf, B., Flek, L., Welch, C., EMNLP 2022
– Mitigating Toxic Degeneration with Empathetic Data, Lahnala, A., Welch, C., Neuendorf, B., Flek, L., NAACL-HLT 2022
– A Critical Reflection and Forward Perspective on Empathy and NLP, Lahnala, A., Welch, C., Jurgens, D., Flek, L., EMNLP 2022
– CAISA at WASSA 2022: Adapter-Tuning for Empathy Prediction, Lahnala, A., Welch, C., Flek, L., WASSA 2022
– Temporal Graph Analysis of Misinformation Spreaders, Sakketou, F., Plepi, J., Geiss, J.-H., Flek, L., TextGraphs @ CoLing 2022
– Understanding Interpersonal Conflict Types and Their Impact on Perception Classification, Welch, C., Plepi, J., Neuendorf, B., Flek, L., NLPCSS @ EMNLP
– Perceived and Intended Sarcasm Detection with Graph Attention Networks, Plepi, J., Flek, L., EMNLP 2021
– PHASE: Learning Emotional Phase-aware Representations for Suicide Ideation Detection on Social Media, Sawhney, R., Joshi, H., Shah, R. R., Flek, L., EACL 2021
– Suicide Ideation Detection via Social and Temporal User Representations using Hyperbolic Learning, Sawhney, R., Joshi, H., Shah, R. R., Flek, L., NAACL 2021
– Automated Template Paraphrasing for Conversational Assistants, Vogel, L., Flek, L., Widening NLP @ EMNLP 2021
- HYPMIX: Hyperbolic Interpolative Data Augmentation, Sawhney, R., Thakkar, M., Agarwal, S., Jin, D., Yang, D., Flek, L., EMNLP 2021
– Towards User-Centric Text-to-Text Generation: A Survey, Yang, D., Flek, L., TSD 2021
Development of methods for assessing the safety of Predictive Neural Networks and improving their robustness (AISafety)
© Generative AI About the project:
Developments of a generic approach to develop robust NN-based classifiers, which are based on insufficient training-data. Development of a generic and statistically well-defined approach to estimate systematic uncertainties due to epistemic network uncertainties. Transform CMS-Open Data from Root to Panda-Data Frames. Transfer developed methods between different fields of science and to industry.
Timing: 01.04.2023-31.03.2026
Principal Investigators: Prof. Dr. Lucie Flek, Prof. Dr. Alexander Schmidt, Prof. Dr. Matthias Schott, Prof. Dr. Christopher Wiebusch
Publications
– Exploring Robustness of LLMs to Sociodemographically-Conditioned Paraphrasing, Pulkit Arora, Akbar Karimi, Lucie Flek
– Exploring Robustness of Multilingual LLMs on Real-World Noisy Data, Amirhossein Aliakbarzadeh, Lucie Flek, Akbar Karimi
– A Comparison of Data Augmentation Techniques for Text Classification, Peyman Hassani Jalilian, Akbar Karimi
– ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving, Abedin, Zain Ul, et al.
– Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?, IB Schlicht, Z Zhao, B Sayin, L Flek, P Rosso, arXiv preprint arXiv:2501.14719 (2025)
– Enforcing Fundamental Relations via Adversarial Attacks on Input Parameter Correlations, T Saala, L Flek, A Jung, A Karimi, A Schmidt, M Schott, P Soldin, …, arXiv preprint arXiv:2501.05588 (2025)
Automatic Analysis of the Dynamics of Dialectal Speech Using Artificial Intelligence Methods (AnDy)
© Generative AI About the project:
The project employs artificial intelligence approaches to analyze the variation between standard language and dialect in authentic recorded conversations. To this end, existing data sets will be used, documenting subjects of different generations throughout Germany. For the first time, such an analysis will be performed directly on speech signals. Furthermore, the project explores the question of how the variation of the identified types of dialect speakers relates to specific social pre-conditions of these individuals, their demographic traits and social context.
Timing: 01.10.2023 – 31.12.2025
Funding: Finanziert von der Europäischen Union – NextGenerationEU, Stärkung der Datenkompetenzen des wissenschaftlichen Nachwuchses des Bundesministeriums für Bildung und Forschung
© NextGenerationEU
Principal Investigators: Prof. Dr. Lucie Flek
Publications
– Does Preprocessing Matter? An Analysis of Acoustic Feature Importance in Deep Learning for Dialect Classification, L Fischbach, C Kleen, L Flek, A Lameli, Proceedings of the Joint 25th Nordic Conference on Computational Linguistics…
Automatic Construction of Gene Regulatory Networks from Scientific Literature with LLMs
© Unsplash About the project:
The project will be conducted in consultation with Dr. Christiane Hellweg (DLR, German Aerospace Centre) and Prof. Dr. Holger Fröhlich (b-it). Prof. Fröhlich conducts research in statistical data mining and machine learning with specific focus on applications in biomedicine. Dr. Hellweg’s research focuses on the effects of radiation on organisms, its possible uses in cancer therapy and the disruption of gene regulation it causes.
Gene regulatory networks describe the interactions of genes in proteins in living organisms. Disruption of those networks can cause a plethora of health problems. The most prevalent one being cancer, which is always based on a disruption of gene regulation.
The biomedical community does a tremendous amount of research about cancer and the underlying genetic causes, which can differ strongly between cancer types. So many papers are published that it becomes impossible to keep an overview. Furthermore, many papers explore connections between only a few genes under specific experimental conditions and not full networks. Leveraging the text comprehension ability of LLMs, the goal is to extract and then combine the partial networks described in papers based on matching experimental conditions.
Timing: 2024/2025
Funding: TRA Modelling (University of Bonn) as part of the Excellence Strategy of the federal and state governments
Principal Investigators: Prof. Dr. Lucie Flek
Searching for High-Frequency Gravitational Waves with Large Language Models
© Generative AI About the project:
The transdisciplinary research “SEARCHING FOR HIGH-FREQUENCY GRAVITATIONAL WAVES WITH LARGE LANGUAGE MODELS” led by Prof. Schott, Dr. Röken and Prof. Flek has been supported by TRA Matter – which offers an exploratory network at the University of Bonn for researchers from various disciplines (e.g., Chemistry, Physics & Astronomy, Molecular Biomedicine, Geodesy, Pharmacy, etc.) and offers unique academic platform for transdisciplinary exchange and collaboration.
Timing: 12/2024 – 11/2025
Funding: Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MWK) as part of TRA Matter and the Excellence Strategy of the federal and state governments.
Principal Investigators: Prof. Dr. Lucie Flek
Polyglot – Developing Open Souce Foundation Models for Low Resource Languages
© Generative AI About the project:
The central research question of our project is:
How can we develop efficient large language models (LLMs) for low-resource languages in a manner that promotes equitable and sustainable access to foundation models?
To address these challenges, our project focuses on:
- Developing larger and higher-quality datasets for the selected low-resource languages.
- Creating open-source monolingual LLMs that are efficient and tailored to these languages, addressing the specific linguistic and cultural nuances often missed by multilingual models.
- Designing benchmarks to assess the performance of these models in a manner that focuses on the specificities and cultural nuances of the selected languages.
Our project embodies a distinctly transdisciplinary approach by integrating STEM-related fields and the Humanities. At its core, the project seeks to address technical as well as societal challenges surrounding the development of large language models for low-resource languages. This fusion of perspectives ensures that technological innovation is aligned with humanistic considerations, creating a balanced and sustainable foundation for this collaborative project.
At the heart of our project is the collaboration between two distinct academic faculties: The Institute of Philosophy (Faculty of Arts) and the Institute of Computer Science (Faculty of Mathematics and Natural Sciences).
The Polyglot project directly aligns with the goals of TRA 6 and the overarching aims of the Excellence Universities Initiative (EXU) by addressing critical sustainability challenges, promoting interdisciplinary research, and fostering global collaboration.
Timing: 2025
Funding: Technology and Innovation for Sustainable Futures (TRA Sustainable Futures) at the University of Bonn
Principle Investigators: Prof. Dr. Lucie Flek (Faculty of Mathematics and Natural Sciences), Dr. Nicholas Kluge Corrêa (Faculty of Arts) (Principal Investigator), Dr. Aniket Sen (Faculty of Mathematics and Natural Sciences ) (Principal Investigator)
Framing History: Identifying and Aligning Historical Narratives from Diverse Text Books
© b-it About the project:
A much quoted saying states that “History is written by the victors”. This implies that records of historical events are not (merely) a collection of facts, but that these facts are subjected to the interpretation of the author. This project will tease apart this “interpretation” from the facts on the basis of a large collection of history books. To do so, it will draw on natural language processing and computational social science to identify parallel historical narratives in an existing digitized data set of 46 high school history textbooks published between 1948 and 1989 in the German Democratic Republic (GDR) and the Federal Republic of Germany (FRG). We will develop novel methods to align the descriptions of the same event across text books; and expose and analyse differences in the framing of these events across the two countries. We will relate our findings to established measures of political indoctrination in the GDR. We will also develop an exploratory tool accessible to the general public. Our methods will be general, and support the analysis of the framing of parallel narratives more broadly, for instance across news articles about the same event from different ideological sources.
Timing: 2025
Funding: 2025 Bonn-Melbourne Research Excellence Fund
Principal Investigator: Prof. Dr. Lucie Flek (University of Bonn)
Team: Dr. Yulia Otmakhova (University of Melbourne), Vahid Sadiri Javadi (University of Bonn), Dr. Dani Sandu (University of Fribourg)
InVirtuo 4.0
© Unsplash About the project:
Developments of a generic approach to develop robust NN-based classifiers, which are based on insufficient training-data. Development of a generic and statistically well-defined approach to estimate systematic uncertainties due to epistemic network uncertainties. Transform CMS-Open Data from Root to Panda-Data Frames. Transfer developed methods between different fields of science and to industry.
Timing: since 2024
Funding: the State of North-Rhine Westphalia (Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen)
Principle Investigators: Prof. Dr. Lucie Flek (Faculty of Mathematics and Natural Sciences), Dr. Dr. Ahmad Aziz, Prof. Dr.Dr. Dominik R. Bach, Prof. Dr. Mario Botsch, Dr. Niclas Braun, Prof. Dr. Kathrin Friedrich, Prof. Dr. Bert Heinrichs, Prof. Dr. Matthias B. Hullin, Prof. Dr. Reinhard Klein, Prof. Dr. Björn Krüger, Prof. Dr. Alexandra Philipsen, Prof. Dr. Martin Reuter
Publications
– Probing the Robustness of Theory of Mind in Large Language Models, C Nickel, L Schrewe, L Flek, arXiv preprint arXiv:2410.06271 (2024)
– Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text Generation, S Balloccu, Z Kasner, O Plátek, P Schmidtová, K Onderková, M Lango, …, Proceedings of the 2nd Workshop on Practical LLM-assisted Data-to-Text …
– Explaining GPT-4’s Schema of Depression Using Machine Behavior Analysis, AV Ganesan, V Varadarajan, YK Lal, VC Eijsbroek, K Kjell, ONE Kjell, …, arXiv preprint arXiv:2411.13800 (2024)
Desinformationskampagnen beheben durch Offenlegung der Faktoren und Stilmittel
© Unsplash About the project:
Addressing Disinformation Campaigns by Disclosing Factors and Techniques. Disinformation campaigns, where misleading information is deliberately spread on a large scale, have become a central threat to the political process and social cohesion. They can influence elections and incite people to engage in self-destructive or even terrorist behavior. In addition to political polarization and opinion division, they also promote other harmful societal phenomena, such as conspiracy theories.
The project DeFaktS, funded by the Federal Ministry of Education and Research (BMBF), follows a comprehensive approach to researching and combating disinformation. For this purpose, an Artificial Intelligence (AI) is trained based on extracted messages from suspicious social media and messenger groups to recognize factors and techniques characteristic of disinformation. The trained AI then forms a component for an XAI (Explainable Artificial Intelligence), which is intended to inform and warn users of online platforms about the potential occurrence of disinformation in a transparent manner.
Another goal of DeFaktS is to make the XAI component accessible to third parties through the development of an application programming interface (API), thereby contributing to a solution that allows online platforms to be moderated as automatically as possible.
During the project period from January 2022 to December 2024, the FZI Research Center for Information Technology leads the consortium with project partners Murmuras UG, Liquid Democracy e.V., and Philipps University of Marburg. DeFaktS is funded by the Federal Ministry of Education and Research.
Timing: 01/2022 – 12/2024
Funding: Bundesministerium für Bildung und Forschung
Publications
– DeFaktS: A German Dataset for Fine-Grained Disinformation Detection through Social Media Framing, S Ashraf, I Bezzaoui, I Andone, A Markowetz, J Fegert, L Flek, Proceedings of the 2024 Joint International Conference on Computational …
– Pitfalls of Conversational LLMs on News Debiasing, I Baris Schlicht, D Altiok, M Taouk, L Flek, arXiv e-prints, arXiv: 2404.06488
– MultiProp Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning, F Aldabbas, S Ashraf, R Sifa, L Flek, Proceedings of the 1st Workshop on NLP for Languages Using Arabic Script, 7-22
– Harnessing Personalization Methods to Identify and Predict Unreliable Information Spreader Behavior, Ashraf, Shaina and Gruschka, Fabio and Flek, Lucie and Welch, Charles, WOAH 2024
– Do LLMs Provide Consistent Answers to Health-Related Questions across Languages?, IB Schlicht, Z Zhao, B Sayin, L Flek, P Rosso, arXiv preprint arXiv:2501.14719 (2025)
Responsible Algorithmic Decision-Making in the Workplace
© Generative AI About the project:
Algorithmic decision-making (or “ADM”) already has a significant impact on how our modern workplace is organized, whether it be through the selection of new hires, managing employees in their daily business, or assisting human decision-makers in the context of complex problems.
Our exploration of responsible ADM in the workplace of the future focuses on three aspects:
- Analysing how ADM affects work procedures and work contents as well as workers’ self-determination and quality of life
- Investigating human-machine work configurations and the responsible design of these configurations for future workplaces
- Assessing organizational, legal and regulatory framework conditions for the responsible design of the future workplace
Timing: 01/2022 – 12/2024
Funding: Centre Responsible Digitality (ZEVEDI)
Principal Investigators: Prof. Dr. Alexander Benlian, Prof. Dr. Matthias Söllner, Dr. Ulrich Bretschneider, Prof. Dr. Lucie Flek, Prof. Dr. Iryna Gurevych, Prof. Dr. Sandra Ohly, Prof. Dr. Lena Rudkowski, Prof. Dr. Bernd Skiera, Prof. Dr. Gerhard Schreiber, Prof. Dr. Domenik H. Wendt
Publications
– Appraisal Framework for Clinical Empathy: A Novel Application to Breaking Bad News Conversations, AC Lahnala, B Neuendorf, A Thomin, C Welch, T Stibane, L Flek, Proceedings of the 2024 Joint International Conference on Computational …
– LeadEmpathy: An Expert Annotated German Dataset of Empathy in Written Leadership Communication, Sedefoglu, Didem and Lahnala, Allison and Wagner, Jasmin and Flek, Lucie and Ohly, Sandra, LREC 2024
Framing History: Identifying and Aligning Historical Narratives from Diverse Text Books
© Generative AI About the project:
Communication between the management of listed stock corporations and their free float shareholders has been subject of intensive corporate governance research for decades. The premise in research has been that small shareholders have to overcome high transaction costs and coordination problems in order to be able to communicate with their company effectively. With the increasing digitization of communication and everyday life, we are new observing a paradigm shift towards fast, adaptive and widely available communication media, from which we also expect radical changes for corporate communication.
Funding: Centre Responsible Digitality (ZEVEDI)
Principal Investigators: Prof. Dr. Oliver Hinz, Prof. Dr. Florian Möslein, Prof. Dr. Lucie Flek, Prof. Dr. Katja Langenbucher, Prof. Dr. Sophie Loidolt
Workshop: Interdisciplinary workshop, 04 March 2024, 13:00 – 17:00
