Cordial invitation to a Guest Talk by Dr. Yuval Pinter "When Language Models meet words"
In a Guest Talk on April 17 Dr. Yuval Pinter will present NYTWIT, a dataset created to challenge large language models (LLMs) at the lexical level.
Over the last few years, deep neural models have taken over the field of natural language processing (NLP), brandishing great improvements on many of its sequence-level tasks. But the end-to-end nature of these models makes it hard to figure out whether the way they represent individual words aligns with how language builds itself from the bottom up, or how lexical changes in register and domain can affect the untested aspects of such representations, or which phenomena can be modeled by units smaller than the word.
In a Guest Talk on April 17, Dr. Yuval Pinter, Senior Lecturer in the Department of Computer Science at Ben-Gurion University of the Negev, will present NYTWIT, a dataset created to challenge large language models (LLMs) at the lexical level, tasking them with identification of processes leading to the formation of novel English words, as well as with segmentation and recovery of the specific subclass of lexical blends, demonstrating the ways in which subword-tokenized LLMs fail to analyze them. Dr. Pinter will then present XRayEmb, a method which alleviates the hardships of processing these novelties by fitting a character-level encoder to existing models' subword tokenizers; and SaGe, a subword tokenizer that incorporates context into the vocabulary creation objective.
- When? Monday, April 17, 2023, 2 p.m.
- Where? Institute for Computer Science of the University of Bonn, b-it lecture hall, room 0.109
- Language? The Guest Talk will be held in English
Learn more about Dr. Yuval Pinter
Dr. Yuval Pinter is a Senior Lecturer in the Department of Computer Science at Ben-Gurion University of the Negev, focusing on natural language processing as PI of the MeLeL lab. Yuval got his PhD at the Georgia Institute of Technology School of Interactive Computing as a Bloomberg Data Science PhD Fellow. Prior to this, he worked as a Research Engineer at Yahoo Labs and as a Computational Linguist at Ginger Software, and obtained an MA in Linguistics and a BSc in CS and Mathematics, both from Tel Aviv University. Yuval blogs (in Hebrew) about language matters on Dagesh Kal.