Skip to main content

Research Repository

Advanced Search

Content type profiling of data-to-text generation datasets.

Upadhyay, Ashish; Massie, Stewart

Authors



Contributors

Nicoletta Calzolari
Editor

Chu-Ren Huang
Editor

Hansaem Kim
Editor

James Pustejovsky
Editor

Leo Wanner
Editor

Key-Sun Choi
Editor

Pum-Mo Ryu
Editor

Hsin-Hsi Chen
Editor

Lucia Donatelli
Editor

Heng Ji
Editor

Sadao Kurohashi
Editor

Patrizia Paggio
Editor

Nianwen Xue
Editor

Seokhwan Kim
Editor

Younggyun Hahm
Editor

Zhong He
Editor

Tony Kyungil Lee
Editor

Enrico Santus
Editor

Francis Bond
Editor

Seung-Hoon Na
Editor

Abstract

Data-to-Text Generation (D2T) problems can be considered as a stream of time-stamped events with a text summary being produced for each. The problem becomes more challenging when event summaries contain complex insights derived from multiple records either within an event, or across several events from the event stream. It is important to understand the different types of content present in the summary to help us better define the system requirements so that we can build better systems. In this paper, we propose a novel typology of content types, that we use to classify the contents of event summaries. Using the typology, a profile of a dataset is generated as the distribution of the aggregated content types which captures the specific characteristics of the dataset and gives a measure of the complexity present in the problem. Through extensive experiments on different D2T datasets we demonstrate that neural generative systems specifically struggle to generate contents of complex types, highlighting the need for improved D2T techniques.

Citation

UPADHYAY, A. and MASSIE, S. 2022. Content type profiling of data-to-text generation datasets. In N. Calzolari, C.-R. Huang, H. Kim. et al. (eds.) Proceedings of the 29th International conference on computational linguistics (COLING 2022), 12-17 October 2022, Gyeongju, Republic of Korea. Stroudsburg, PA: International Committee on Computational Linguistics [online], 29(1), pages 5770–5782. Available from: https://aclanthology.org/2022.coling-1.pdf

Presentation Conference Type Conference Paper (published)
Conference Name 29th International conference on computational linguistics (COLING 2022)
Start Date Oct 12, 2022
End Date Oct 17, 2022
Acceptance Date Aug 15, 2022
Online Publication Date Oct 12, 2022
Publication Date Oct 31, 2022
Deposit Date Aug 4, 2023
Publicly Available Date Aug 4, 2023
Publisher International Committee on Computational Linguistics
Peer Reviewed Peer Reviewed
Volume 29
Pages 5770–5782
Series Title International conference on computational linguistics
Series Number 29
Series ISSN 2951-2093
Book Title Proceedings of the 29th International conference on computational linguistics (COLING 2022)
Keywords Data-to-text generation (D2T); Problems; Typology; Content type typology; Datasets; Generation systems
Public URL https://rgu-repository.worktribe.com/output/1764080
Publisher URL https://aclanthology.org/volumes/2022.coling-1/

Files






You might also like



Downloadable Citations