GEMv2: multilingual NLG benchmarking in a single line of code.

Gehrmann, Sebastian; Bhattacharjee, Abhik; Mahendiran, Abinaya; Upadhyay, Ashish

GEMv2: multilingual NLG benchmarking in a single line of code.

Gehrmann, Sebastian; Bhattacharjee, Abhik; Mahendiran, Abinaya; Upadhyay, Ashish

Authors

Sebastian Gehrmann

Abhik Bhattacharjee

Abinaya Mahendiran

ASHISH UPADHYAY a.upadhyay@rgu.ac.uk
Completed Research Student

Abstract

Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other's work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.

Citation

GEHRMANN, S., BHATTACHARJEE, A., MAHENDIRAN, A., WANG, A., PAPANGELIS, A., MADAAN, A., MCMILLAN-MAJOR, A., SHVETS, A., UPADHYAY, A. and BOHNET, B. 2022. GEMv2: multilingual NLG benchmarking in a single line of code. In Proceedings of the 2022 Conference on empirical methods in natural language processing: system demonstrations, 7-11 December 2022, Abu Dhabi, UAE. Stroudsburg: Association for Computational Linguistics [online], pages 266-281. Available from: https://aclanthology.org/2022.emnlp-demos.27/

Presentation Conference Type	Conference Paper (published)
Conference Name	2022 Conference on empirical methods in natural language processing: system demonstrations
Start Date	Dec 7, 2022
End Date	Dec 11, 2022
Acceptance Date	Jun 17, 2022
Online Publication Date	Dec 11, 2022
Publication Date	Dec 31, 2022
Deposit Date	Mar 27, 2023
Publicly Available Date	Mar 27, 2023
Publisher	ACL Association for Computational Linguistics
Peer Reviewed	Peer Reviewed
Pages	266-281
ISBN	9781959429418
Keywords	Machine learning; Generation evaluation and metrics benchmark
Public URL	https://rgu-repository.worktribe.com/output/1920683
Publisher URL	https://aclanthology.org/2022.emnlp-demos.27