ASHISH UPADHYAY a.upadhyay@rgu.ac.uk
Completed Research Student
Content type profiling of data-to-text generation datasets.
Upadhyay, Ashish; Massie, Stewart
Authors
Dr Stewart Massie s.massie@rgu.ac.uk
Associate Professor
Contributors
Nicoletta Calzolari
Editor
Chu-Ren Huang
Editor
Hansaem Kim
Editor
James Pustejovsky
Editor
Leo Wanner
Editor
Key-Sun Choi
Editor
Pum-Mo Ryu
Editor
Hsin-Hsi Chen
Editor
Lucia Donatelli
Editor
Heng Ji
Editor
Sadao Kurohashi
Editor
Patrizia Paggio
Editor
Nianwen Xue
Editor
Seokhwan Kim
Editor
Younggyun Hahm
Editor
Zhong He
Editor
Tony Kyungil Lee
Editor
Enrico Santus
Editor
Francis Bond
Editor
Seung-Hoon Na
Editor
Abstract
Data-to-Text Generation (D2T) problems can be considered as a stream of time-stamped events with a text summary being produced for each. The problem becomes more challenging when event summaries contain complex insights derived from multiple records either within an event, or across several events from the event stream. It is important to understand the different types of content present in the summary to help us better define the system requirements so that we can build better systems. In this paper, we propose a novel typology of content types, that we use to classify the contents of event summaries. Using the typology, a profile of a dataset is generated as the distribution of the aggregated content types which captures the specific characteristics of the dataset and gives a measure of the complexity present in the problem. Through extensive experiments on different D2T datasets we demonstrate that neural generative systems specifically struggle to generate contents of complex types, highlighting the need for improved D2T techniques.
Citation
UPADHYAY, A. and MASSIE, S. 2022. Content type profiling of data-to-text generation datasets. In N. Calzolari, C.-R. Huang, H. Kim. et al. (eds.) Proceedings of the 29th International conference on computational linguistics (COLING 2022), 12-17 October 2022, Gyeongju, Republic of Korea. Stroudsburg, PA: International Committee on Computational Linguistics [online], 29(1), pages 5770–5782. Available from: https://aclanthology.org/2022.coling-1.pdf
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 29th International conference on computational linguistics (COLING 2022) |
Start Date | Oct 12, 2022 |
End Date | Oct 17, 2022 |
Acceptance Date | Aug 15, 2022 |
Online Publication Date | Oct 12, 2022 |
Publication Date | Oct 31, 2022 |
Deposit Date | Aug 4, 2023 |
Publicly Available Date | Aug 4, 2023 |
Publisher | International Committee on Computational Linguistics |
Peer Reviewed | Peer Reviewed |
Volume | 29 |
Pages | 5770–5782 |
Series Title | International conference on computational linguistics |
Series Number | 29 |
Series ISSN | 2951-2093 |
Book Title | Proceedings of the 29th International conference on computational linguistics (COLING 2022) |
Keywords | Data-to-text generation (D2T); Problems; Typology; Content type typology; Datasets; Generation systems |
Public URL | https://rgu-repository.worktribe.com/output/1764080 |
Publisher URL | https://aclanthology.org/volumes/2022.coling-1/ |
Files
UPADHYAY 2022 Content type profiling (VOR)
(512 Kb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
You might also like
A case-based approach to data-to-text generation. [Software]
(-0001)
Digital Artefact
Context-aware data-to-text generation.
(2024)
Thesis
WEC: weighted ensemble of text classifiers.
(-0001)
Presentation / Conference Contribution
Case-based approach to automated natural language generation for obituaries.
(-0001)
Presentation / Conference Contribution
GEMv2: multilingual NLG benchmarking in a single line of code.
(-0001)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search