Sebastian Gehrmann
GEMv2: multilingual NLG benchmarking in a single line of code.
Gehrmann, Sebastian; Bhattacharjee, Abhik; Mahendiran, Abinaya; Upadhyay, Ashish
Authors
Abstract
Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other's work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.
Citation
GEHRMANN, S., BHATTACHARJEE, A., MAHENDIRAN, A., WANG, A., PAPANGELIS, A., MADAAN, A., MCMILLAN-MAJOR, A., SHVETS, A., UPADHYAY, A. and BOHNET, B. 2022. GEMv2: multilingual NLG benchmarking in a single line of code. In Proceedings of the 2022 Conference on empirical methods in natural language processing: system demonstrations, 7-11 December 2022, Abu Dhabi, UAE. Stroudsburg: Association for Computational Linguistics [online], pages 266-281. Available from: https://aclanthology.org/2022.emnlp-demos.27/
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 2022 Conference on empirical methods in natural language processing: system demonstrations |
Start Date | Dec 7, 2022 |
End Date | Dec 11, 2022 |
Acceptance Date | Jun 17, 2022 |
Online Publication Date | Dec 11, 2022 |
Publication Date | Dec 31, 2022 |
Deposit Date | Mar 27, 2023 |
Publicly Available Date | Mar 27, 2023 |
Publisher | ACL Association for Computational Linguistics |
Peer Reviewed | Peer Reviewed |
Pages | 266-281 |
ISBN | 9781959429418 |
Keywords | Machine learning; Generation evaluation and metrics benchmark |
Public URL | https://rgu-repository.worktribe.com/output/1920683 |
Publisher URL | https://aclanthology.org/2022.emnlp-demos.27 |
Files
GEHRMANN 2022 GEMv2
(1.5 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Copyright Statement
Copyright © 1963–2023 ACL.
You might also like
WEC: weighted ensemble of text classifiers.
(2020)
Presentation / Conference Contribution
Case-based approach to automated natural language generation for obituaries.
(2020)
Presentation / Conference Contribution
A case-based approach to data-to-text generation.
(2021)
Presentation / Conference Contribution
A case-based approach to data-to-text generation. [Software]
(-0001)
Digital Artefact
A case-based approach for content planning in data-to-text generation.
(2022)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search