Eneko Agirre Euskal Herriko Unibertsitatea Universidad del País Vasco University of the Basque Country Spain e.agirre@ehu.eus |
Abstract: Deep Learning has made tremendous progress in Natural Language Processing (NLP), where large pre-trained language models (PLM) fine-tuned on the target task have become the predominant tool. More recently, in a process called prompting, NLP tasks are rephrased as natural language text, allowing us to better exploit linguistic knowledge learned by PLMs and resulting in significant improvements. Still, PLMs have limited inference ability. In the Textual Entailment task, systems need to output whether the truth of a certain textual hypothesis follows from the given premise text. Manually annotated entailment datasets covering multiple inference phenomena have been used to infuse inference capabilities to PLMs.
This talk will review these recent developments, and will present an approach that combines prompts and PLMs fine-tuned for textual entailment that yields state-of-the-art results on Information Extraction (IE) using only a small fraction of the annotations. The approach has additional benefits, like the ability to learn from different schemas and inference datasets. These developments enable a new paradigm for IE where the expert can define the domain-specific schema using natural language and directly run those specifications, annotating a handful of examples in the process. A user interface based on this new paradigm will also be presented. Beyond IE, inference capabilities could be extended, acquired and applied from other tasks, opening a new research avenue where entailment and downstream task performance improve in tandem.
Eneko Agirre's Biography
Eneko Agirre BSc (University of the Basque Country, UPV/EHU), MSc (University of Edinburgh), PhD (UPV/EHU), is full profesor in the Computer Science department of the UPV/EHU and director of the HiTZ research center. He has published over 200 international peer-reviewed articles and conference papers in NLP. He has been secretary and president of the ACL SIGLEX, member of the editorial board of Computational Linguistics, Transactions of the ACL and Journal of Artificial Intelligence Research. He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM), now in its ninth edition. He is a usual reviewer for top international journals, a regular area chair and member of the program committees for top international conferences. He has been PI of several National and European projects. He has received three Google Research Awards in 2016, 2018 and 2019, and five best paper awards and nominations it top conferences like EMNLP, CoNLL and Coling. Several dissertations under his supervision have received awards: SEPLN 2018 and 2021, EurAI 2021. He received the Spanish Computer Science Research Award in 2021 and has been elected as an ACL Fellow 2021 of the Association for Computational Linguistics (ACL) for his outstanding work in natural language processing and machine learning.
Anna Rogers Center for Social Data Science University of Copenhagen Denmark arogers@sodas.ku.dk |
Abstract: NLP leaderboards create the impression that the current systems surpass human performance on natural language understanding tasks, but this is far from reality. In particular, the progress in question answering far outpaces progress on benchmarks that could not be gamed by learning spurious patterns, and the very term "machine reading comprehension" is misleading (at least with respect to the current systems). This talk discusses how we could test that the models do what we think they should be doing, what kinds of questions we have been asking our models, and what are the main challenges in creating better data.