NEWS

Dr. Florian Mai in Deutsches Museum Bonn
20.03.2025, 19:00 - 21:00
One can't get enough of AI safety and trustworthiness debates - after all, we humans have to align on it too! If you are in Bonn, check the one in Deutsches Museum. Our group is represented by Dr. Florian Mai , Junior Group Leader on AI Reasoning and Alignment.

BERLIN - International Women´s Day 2025:
Prof. Dr. Lucie Flek joined the panel discussion hosted by Federal Ministry of Education and Research
Technology of tomorrow, problems of yesterday: What can be done about the gender gap in AI?
How can we inspire more young women to study AI? Why is equal gender participation in the development of Artificial Intelligence so important? And how can we address the gender gap in AI?
Discussing together with:
- Dr. Tina Klüwer, Leiterin der Abteilung Forschung für technologische Souveränität und Innovationen im BMBF
- Kenza Ait Si Abbou Lyadini, CTO und Board Member bei Fiege
- Daniel Krupka, Geschäftsführer der Gesellschaft für Informatik e.V.
- Dr. Kinga Schumacher, Senior Researcherin und Leiterin der AG Diversity & Gender Equality am Deutschen Forschungszentrum für Künstliche Intelligenz (DFKI) in Berlin

Celebrating Our Achievements in 2024: A Year of Innovation, Collaboration, and Growth
As 2024 comes to an end, we’re taking a moment to reflect on the remarkable journey our research team has undertaken over the past twelve months. The excitement, curiosity, and commitment to pushing boundaries...READ MORE

Prof. Dr. Lucie Flek at InVirtuo 4.0
In the final session of this semester’s InVirtuo 4.0 seminar, Prof. Dr. Lucie Flek (Lamarr'a Area Chair for NLP) alongside Mahan Akbari Moghanjoughi presented her team’s latest research on enhancing social intelligence in large language models.
Her work contributes to a growing field focused on ensuring AI systems engage in more natural, meaningful, and socially intelligent conversations—an essential step for the future of human-computer interaction.


OUR PAPERS ON ACL24:
Perspective Taking through Generating Responses to Conflict Situations
Plepi, Joan and Welch, Charles and Flek, Lucie
Despite the steadily increasing performance language models achieve on a wide variety of tasks, they continue to struggle with theory of mind, or the ability to understand the mental state of others. Language assists in the development of theory of mind, as it facilitates the exploration of mental states. This ability is central to much of human interaction and could provide many benefits for language models as well, as being able to foresee the reactions of others allows us to better decide which action to take next. This could help language models generate responses that are safer, in particular for healthcare applications, or more personalised, e.g. to sound more empathetic, or provide targeted explanations. These systems can generate responses that are not only relevant to the users queries but also reflect users personal style. Thereby creating a more engaging and customised interaction experience.
In fact, there is a growing interest in a perspectivist approach to many natural language processing (NLP) tasks, which emphasizes that there is no single ground truth. We construct a corpus to study perspective taking through generating responses to conflict situations.
An example from our corpus can be found in Figure 1. We see a user asking if they did something wrong in a conversation with their girlfriend about whether or not to terminate a pregnancy. On the right, there are two responses from other users with different judgments of the situation (reasoning and verdict NTA/YTA). On the left, we see self descriptive statements of each user. Author Y appears to be more family-oriented than Author X which may impact their judgement of the situation.

We focus on three research questions:
RQ1: How should we evaluate perspective taking through the lens of NLG?
We develop a novel evaluation of by asking humans to rank the human response, model output, and a distractor human response, combining approaches from persona consistency. We find that previous consistency evaluation metrics are inadequate and proposed a human ranking evaluation that includes similar human responses. Additionally, we find that our generation model performed competitively with previous work on perspective classification.
RQ2: Do tailored, user-contextualized architectures outperform large language models (LLMs) on this task?
We design two transformer architectures to embed personal context and find that our twin encoder approach outperforms LLMs. We compare tailored architectures to LLMs, including two novel methods, finding that our twin encoder architecture outperformed recent work, FlanT5 and Llama2 models.
RQ3: What user information is most useful to model perspective taking?
Experiments with varied user context showed that self-disclosure statements semantically similar to the conflict situation were most useful.
and
Talk to us during ACL 2024
Prof. Lucie Flek, Dr. Mounika Marreddy and Shangrui Nie look forward to being in Bangkok in August and happy to get in touch!
Data Science and Language Technologies Group @CaisaLab proudly presents two papers: "Do Multilingual Large Language Models Mitigate Stereotype Bias?" and "Perspective Taking through Generating Responses to Conflict Situations" on ACL 2024 in Bangkok and co-organises @HuCLLM Workshop on Thursday, August 15th.

Congratulations, Vahid Sadiri Javadi!
WissenschaftsAward of the Bundesverband Direktvertrieb Deutschland (BDD) for Vahid Sadiri Javadi and his work "OpinionConv: Conversational Product Search with Grounded Opinions".
This work, started around the GPT-2 release, was ahead of its time in asking "Should dialog systems be able to provide opinionated answers? If so, how can we ensure these being authentic, well-founded, and useful for the user?" The proposed pipeline generalized to today's RAG-based LLM approaches, being reusable in modern conversational product search.

AI24 in Dortmund
💡 Spotlight on #GenerativeAI at #AI24! Don't miss our essential sessions for professionals at the intersection of AI and industry, highlighting their economic potential on Day 2 of #LamarrCon.
🔎 In the Session “Generative AI – Foundation Models” at 14:45, Prof. Dr. David Jurgens from the University of Michigan will explore "Do Foundation Models have #Personalities?", showcasing the risks and opportunities in aligning these models with specific user groups, critical for personalized AI applications in industries.
Next, at 15:15, Dr. Mehdi Ali from #Lamarr at Fraunhofer IAIS will present "Training Large Language Models at Scale using #Modalities." Learn about their open-source framework for training large-scale models efficiently, a game-changer for businesses looking to leverage AI at scale. 🖥️
In the Session “Generative AI – Large Language Models” at 16:15, Dr. Mounika Marreddy, Vahid Sadiri Javadi, and Dr. Akbar Karimi from the The University of Bonn will lead a hands-on workshop on "#LLMs in Action". Gain practical skills in utilizing state-of-the-art models like Flan-T5 and #Llama, essential for integrating AI into real-world applications. 🔧
These sessions are crucial for understanding how Generative AI can optimize processes, solve complex problems, and create new economic opportunities across various sectors, including logistics and supply chain management. 📈
🎫 Register now to be part of #AI24 – The Lamarr Conference and explore the future of Generative AI: https://lnkd.in/ezG4tsXN
#TechConference | Lucie Flek | OpenGPT-X


ACL24 Bangkok: The First Workshop on Human-Centered LLMs
With Nikita Soni , Ashish Sharma, Yang Diyi, Sara Hooker and H. Andrew Schwartz we organised the first workshop on Human-Centered LLMs.
We received 35 paper submissions, and were excited that the workshop room was overfilled the whole day, with fantastic keynotes by Cristian D. (" Assisting Human-Human Communication With Artificial Conversational Intuition"), Barbara Plank ("We are (still) not paying enough attention to human label variation"), Vered Shwartz ("Navigating Cultural Adaptation of LLMs: Knowledge, Context, and Consistency") and Daniel Hershcovich ("Reversing the Alignment Paradigm: LLMs Shaping Human Cultural Norms, Behaviors, and Attitudes"), as well as an exciting poster session and paper talks.




Lamarr NLP Researchers Train Multilingual Large Language Models Mitigating Stereotype Bias
Bias in large language models is a well-known and unsolved problem. In our new paper "Do Multilingual Large Language Models Mitigate Stereotype Bias?" we address this challenge by investigating the influence of multilingual training data on model bias reduction.
In the Lamarr NLP research collaboration between Fraunhofer IAIS and the Language Technologies group at the university of Bonn, we have trained six large language models on public data (one for each of Spanish, German, French, Italian and English, and a combined multilingual one), and compared these LLMs to their state-of-the-art counterparts on multilingual bias benchmarks. Our results show that all multilingual models trained on the same number of tokens as monolingual models are less biased for all languages and benchmarks. In addition, our models are generally less biased than selected open-source LLMs of similar size.
We are very happy to announce that our work, conducted by Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Afsan Mowmita, Nicolas Flores-Herr, Mehdi Ali and Lucie Flek, has been accepted at the C3NLP (Cross-Cultural Considerations in NLP), collocated with the ACL 2024 in Bangkok.
The challenge of bias in large language models
Large language models allow us to quickly and easily derive and implement real-world applications in fields as diverse as healthcare, finance and law. Huge amounts of data are used to train these models, resulting in incredible model performance. However, numerous studies have shown that large language models learn biases during training that can lead to discrimination against certain groups of people in their downstream application. Often these biases arise from the training data itself, and they vary between different languages and models.
Multilinguality as a solution approach
To avoid the harm of discrimination, research aims to mitigate bias in large language models. Among various approaches, previous research indicates that using multilingual training data to train large language models reduces model bias. In doing so, the models can benefit from the use of different languages, which differ in semantics and syntax and cover a wider cultural diversity. Within our work, we are building up on this previous work by investigating especially in the impact of monolingual versus multilingual data of larger, decoder-based language models.
Our experiments and findings
In our experiments, we train six novel large language models, one each for Spanish, German, French, Italian and English, as well as a multilingual model trained on all five languages but using the same number of tokens. To compare the bias of the monolingual and multilingual models, we benchmark them against two well-known bias evaluation benchmarks CrowS-Pairs andBBQ. The prior measures the degree to which a model prefers stereotyping over a less stereotyping sentences, while the latter is a question-answering dataset that requires from a model to answer stereotype-reflecting questions regarding two social groups (compare Figure 1). As both benchmarks are originally only available in English, we perform a human-validated automated translation.
Our results support the initial hypothesis that multilingual large language models reduce model bias. Within the experiments, we find that all multilingual models trained on the same number of tokens as monolingual models are less biased than the monolingual models for all languages and both benchmarks. We also find that our models are generally less biased than selected open-source large language models of similar size, although they fall short of zero-shot prompt-based approaches with GPT3.

“Relate – Talk with Me”
The interactive exhibition by Andrea Familari is open until February 2, 2025.
This installation examines the intersection of Artificial Intelligence and human relationships, allowing visitors to converse with a custom voice bot represented by a lifelike avatar.
Prof. Lucie Flek, Area Chair for Natural Language Processing at the Lamarr Institute, contributed her expertise to enhance the bot's conversational abilities, tackling issues of bias and the dynamics of human-AI interaction. At the Lamarr Institute, we are dedicated to advancing AI research with a focus on its societal impact, and projects like “Relate – Talk with Me” reflect our mission to explore AI's role in communication and connection.

Prof. Lucie Flek joining from 12.04. - 19.04.2024 the delegation of NRW on the West Coast of the United States and representing AI community in NRW & USA
@lucie_nlp Prompt: "An image of three people - a director of supercomputing research center, an astronaut, and a professor of machine learning - standing in front of a Voyager spacecraft." Which one do you prefer?
Four papers accepted
at LREC-COLING 2024
What a great start of the year! We are very excited to announce that we got four paper accepted at LREC-COLING 2024.
“Appraisal Framework for Clinical Empathy: A Novel Application to Breaking Bad News Conversations” Authors: Allison Lahnala, Béla Neuendorf, Alexander Thomin, Charles Welch, Tina Stibane, Lucie Flek
Camera-Ready: Paper
In this paper, we introduce an annotation approach...Read more.

Professor Dr Lucie Flek accompanies NRW Minister President Hendrik Wüst on a delegation trip to the USA
Prof. Lucie Flek representing the University of Bonn and the Lamarr Institute speaking on "Transatlantic AI Symposium: Europe's Heartbeat of Innovation" hosted by NRW.Global Business, trade and investment agency of the German state of North Rhine-Westphalia (NRW).
Cooperation and coordination of leading researchers in Europe and USA is the way how to cope with complex AI challenges and utilitize new AI opportunities. Read more.
Caisa Highlights 2023
INNOVATIONS, INSIGHTS and INSPIRATION: OUR JOURNEY THROUGH 2023
Hi everyone! As we say goodbye to a fantastic year full of new discoveries and fun research, we at the Conversational AI and Social Analytics (CAISA) Lab want to share our 2023 story with you.
Think of it like a digital scrapbook, full of our best moments and exciting projects, all ready for you to check out.
The Responsible and Timely Uptake of AI in the EU
The Scientific Advice Mechanism to the European Commission published on 23.04.2024 its latest advice on “The responsible and timely uptake of AI in the EU”. Together with many senior researchers from multiple disciplines across EU Prof. Lucie Flek participated in the scientific process and contributed to the evidence review and collection of perspectives.
You will find the SAPEA Evidence Review Report, all workshop reports, and the relevant Scientific Opinion by the Group of Chief Scientific Advisors to the European Commission here.


Caisa Lab at LREC-COLING 2024
We're proud to have our Caisa Lab team: Dr. Wei-Fan Chen, Shaina Ashraf, Allison Lahnala and Ipek Baris Schlicht presenting their work at LREC-COLING 2024 in Torino, Italy. Join them in discovering their innovative contributions to #NLP and #AI. Read more FOUR PAPERS ACCEPTED AT LREC-COLING 2024!