MuZero - Dynamic Learning for LLM Dialog Planning

David Kaczer (KU Leuven)

30. April 2024
13:30 – 14:15

Abstract:

While large language models (LLMs) perform well on a variety of language-related tasks, they struggle with tasks that require planning. We apply the existing MuZero algorithm to enhance the planning capabilities of LLMs in dialog settings. MuZero uses a neural network to represent observations into a latent space, and then performs Monte Carlo tree search in the latent space using dynamics learned through self-play. We develop a simulated dialog environment to train the MuZero-based model on conversations with a generative LLM such as DialoGPT. We also investigate modifications to the model architecture, such as replacing the representation network by a transformer pretrained on sentence classification. We evaluate our algorithm on realistic multi-turn dialog planning tasks, such as steering the dialog topic to a predefined goal.

MuZero - Dynamic Learning for LLM Dialog Planning

Related posts from this category