Potential Topics - b-it-center

Data Science & Language Technologies Group

Potential Topics 2025

Here is a list of possible topics to write a Master thesis under the supervision of researchers of Data Science and Language Technologies Group.

Indoctrination detection: Exploring Children's and Educational Books and Narrative Influence in Geo-Political Contexts

This research direction offers a comprehensive exploration of the influence of educational and children's books on shaping ideologies and attitudes, with potential implications for geo-political contexts. The findings may shed light on the power of narratives in molding perceptions and beliefs, which is crucial in understanding historical and contemporary socio-political dynamics. The work may involve the following: 1) Data Augmentation and Model Evaluation; 2) Topic modeling, Narrative Comparisons, and framing analysis to explore the similarities and differences in their framing, themes, and ideologies; 3) Sentiment Analysis and Aspect-Based Sentiment; 3) Indoctrination detection.

Value-alignment for AI Agents

In contrast to chatbots, agents are AI models that can interact with and influence the world by taking actions and by interacting with humans or other agents. Given their increased impact, it becomes even more important that these AI models are properly aligned to human values [3]. In this master thesis, we will explore if techniques for instilling values and morality in LLM agents in a controlled environment [2, 4] causes them to behave more safely in realistic social scenarios [1]].

[1] Zhou, Xuhui, Hyunwoo Kim, Faeze Brahman, Liwei Jiang, Hao Zhu, Ximing Lu, Frank Xu et al. "Haicosystem: An ecosystem for sandboxing safety risks in human-ai interactions." arXiv preprint arXiv:2409.16427 (2024).
[2] Pan, Alexander, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Hanlin Zhang, Scott Emmons, and Dan Hendrycks. "Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark." In International Conference on Machine Learning, pp. 26837-26867. PMLR, 2023.
[3] Wang, Yufei, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, and Qun Liu. "Aligning large language models with human: A survey." arXiv preprint arXiv:2307.12966 (2023).
[4] Tennant, Elizaveta, Stephen Hailes, and Mirco Musolesi. "Moral Alignment for LLM Agents." arXiv preprint arXiv:2410.01639 (2024).

Empathetic LLMs for Virtual therapeutic VR Agents

invirtuo.org , https://www.youtube.com/watch?v=Npb1GXr-ROk

AstroLlama

Searching for galaxy clusters with LLMs and https://www.mpe.mpg.de/eROSITA

Hyperbolic Language Representation

expanding the experiments from this paper to hyperbolic space: https://aclanthology.org/2023.vardial-1.7.pdf to see if it is more suitable for modeling language prefixes in multilingual generation

Best Practices for LLM Based Knowledge Extraction on Scientific Texts

LLM based Knowledge extraction and Knowledge graph construction is highly relevant in the scientific field; they can be used for simulations in drug target discovery [1] and automatic curation of vast databases[2]. In this thesis you will be systematically evaluating different approaches and methods from fine tuning, guided generation and prompt engineering, to test the performance of LLMs as Scientific annotators. A dataset for evaluation already exists. You will survey the current landscape of methods for knowledge extraction and pick the ones you find most interesting to evaluate. A good overview can be found in [3]. This line of work will most likely become more and more relevant in the next few years and the skills you acquire here should translate well to scientific as well as more industry focused work. In short you will work on constructing a pipeline for LLMs as annotators evaluation, based on an existing dataset. Providing a clear way to compare multiple different methods. The goal would be to publish the findings in the end. You would be working closely with your supervisor on the topic.

Sources:
1. Raschka T, Sood M, Schultz B, Altay A, Ebeling C, et al. (2023) AI reveals insights into link between CD33 and cognitive impairment in Alzheimer’s Disease. PLOS Computational Biology 19(2): e1009894. https://doi.org/10.1371/journal.pcbi.1009894
2. Dagdelen, J., Dunn, A., Lee, S. et al. Structured information extraction from scientific text with large language models. Nat Commun 15, 1418 (2024). https://doi.org/10.1038/s41467-024-45563-x
3. https://github.com/quqxui/Awesome-LLM4IE-Papers

SimPPP (Simulation Personification and Preference Prediction)

There has been a growing interest in the performance of LLMs taking on personas and Staying true to the persona they are playing[1]. Another interesting avenue is the prediction of user preference through LLMs with some promising early findings [2]. Now the question becomes can we combine those two, to model the preferences of different social groups on more specific topics?
You will be working on finding or designing a way to evaluate this performance, Test if personas can help preference prediction. Depending on your interests the focus can also be more on amassing a new dataset for this task, this might be specifically interesting if you have some background or interest in social studies or marketing research.
Additionally of note is that there might be a project cooperation with a company on a very related application providing industry contact and research funding. However this is not final.
Additional reading material to get an idea: personalLLM, Alignment via interaction

A Computational Approach Discourse Analysis of Clinical Empathy in Breaking Bad News Conversations

Model clinical empathy with LLMs based on this data: https://aclanthology.org/2024.lrec-main.124/, understand missed opportunities to react empathetically, evaluate clinician’s responses

This thesis will explore how computational models can detect and analyze empathic communication between medical professionals and patients using the BBN Empathy Dataset, annotated with linguistic structures from appraisal theory.

Research Goals and Techniques:
● Natural Language Processing (NLP): Develop computational models to identify empathic opportunities, responses, and discourse structures using NLP techniques.
● Linguistic Analysis: Apply theories like the appraisal framework and discourse analysis to interpret the role of empathy in BBN doctor-patient interactions.
● Dataset Augmentation: Create synthetic datasets simulating “good” and “bad” doctors for training and evaluation, enabling scalable NLP systems for
communications training.
● Bridging with Medical Knowledge: Explore how the findings can be integrated into medical training by providing actionable feedback for healthcare professionals. This includes designing systems that give personalized advice on communication skills, helping doctors improve empathy-driven conversations with patients.

Students will gain hands-on experience in applied NLP research, discourse analysis, and AI-driven healthcare applications. The project’s findings could support developing digital tools to train medical professionals, enhancing empathy in real-world healthcare settings. Ideal candidates should have a background in computational linguistics, machine learning, or healthcare informatics, with a strong interest in interdisciplinary problem-solving.

Article-level Media Bias Mitigation

Following the results from NLP Lab, based on the crawled dataset on allsides.com. Continue working on finding the bias part in an article. Studying how to mitigate bias on article-level. May co-supervise with Mr. Shangrui Nie from Data Science and Language Technologies as he has a similar research direction.

Wikipedia "did you know" Question Answering

Crawling the "did you know" page from wikipedia[https://en.wikipedia.org/wiki/Wikipedia:Recent_additions], as <question, answer, page containing the answer> tuples. Later develope a QA model. An initial idea is considering the method that Dr. Wei-Fan Chen from our Data Science and Language Technologies Group is developing (halumap).

Correlation between Hallucination and Acceptability in Computational Argumentation

Seeing that local acceptability being defined as a premise of an argument is acceptable if it is rationally worthy of being believed to be true while as global acceptability being defined as argumentation is acceptable if the target audience accepts both the consideration of the stated arguments for the issue and the way they are stated. It seems there is a strong correlation in "faithfulness hallucination vs local acceptability", and "factualness hallucination vs global acceptability". Some knowing methods in hallucination can be applied on detecting acceptability or increasing acceptability in argument generation.

Analyzing Noisy Gravitational Wave Time Series with Transformers

The GravNet collaboration [1] is building an array of gravitational wave (GW) detectors for primordial black hole (PBH) mergers. PBHs are small black holes that were formed early after the Big Bang. They are thought to be a significant component of dark matter, but so far have never been directly observed. Observing a PBH merger would have far-reaching implications for our understanding of the universe.

PBH mergers are expected to occur very rarely, while the detector data is affected by many sources of noise, so distinguishing the GW signals from the noise is challenging. Another challenge is correlating data from multiple detectors in different locations around the globe. Machine learning techniques for time series processing were critical in analyzing the LIGO data that led to the 2017 Nobel Prize in Physics [2] [3] (first direct observation of GWs).

In this thesis project, you will work with a team of computer scientists and physicists to develop a Transformer model for detecting gravitational wave signals in noisy time series data. Your tasks will include
- Investigating Transformer architectures for learning from noisy time series with simulated GW signals
- Developing a tokenization strategy for converting sequences of >10^7 numerical data points into tokens for Transformer training
- Comparing time series classification and time series denoising approaches
- Investigating transfer learning on time series data from different sources
Prior background in physics or signal processing is helpful but certainly not required.

References:
[1] T. Schneemann, K. Schmieden, and M. Schott, Search for gravitational waves using a network of RF cavities, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 1068, 169721 (2024).
[2] S. A. Usman et al., The PyCBC search for gravitational waves from compact binary coalescence, Class. Quantum Grav. 33, 215004 (2016).
[3] H. Wang, Y. Zhou, Z. Cao, Z. Guo, and Z. Ren, WaveFormer: transformer-based denoising method for gravitational-wave data, Mach. Learn.: Sci. Technol. 5, 015046 (2024).

Privacy & Fairness in User Modeling under Domain Shift

https://aclanthology.org/2022.privatenlp-1.2/ - extending the experiments and analysis to confirm the findings in a more robust manner

Planning Architectures for Sequence Models

While LLMs are often considered strong heuristic reasoners, their ability to plan and reason deliberatively is limited. In this master thesis, we design neural architectures and training objectives to improve the planning and reasoning ability of Transformer models and evaluate them on a range of synthetic reasoning tasks that are linguistically plausible [1].

[1] Nolte, Niklas, Ouail Kitouni, Adina Williams, Mike Rabbat, and Mark Ibrahim. "Transformers Can Navigate Mazes With Multi-Step Prediction." arXiv preprint arXiv:2412.05117 (2024).

Capability / Safety Tradeoffs in Inference-time Compute Methods

By spending increasing amounts of computation time at inference-time, new LLM models like OpenAI o1 [1] are opening new avenues for advanced reasoning capabilities. However, this new paradigm also raises important questions with regards to the alignment and safety. In this thesis, we investigate the relationship between safety and capability of inference-time compute methods through the lens of open source implementations by measuring the compute tradeoff between producing a solution and verifying a solution.

[1] Zhong, Tianyang, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu et al. "Evaluation of openai o1: Opportunities and challenges of agi." arXiv preprint arXiv:2409.18486 (2024).