MicroConceptBERT: concept-relation based document information extraction framework.

Silva, Kanishka; Silva, Thushari; Nanayakkara, Gayani

doi:10.1109/SLAAI-ICAI59257.2023.10365022

MicroConceptBERT: concept-relation based document information extraction framework.

Silva, Kanishka; Silva, Thushari; Nanayakkara, Gayani

Authors

Kanishka Silva

Thushari Silva

Ms GAYANI NANAYAKKARA g.nanayakkara@rgu.ac.uk
Research Student

Abstract

Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models.

Citation

SILVA, K., SILVA, T. and NANAYAKKARA, G. 2023. MicroConceptBERT: concept-relation based document information extraction framework. In Proceedings of the 7th SLAAI (Sri Lanka Association for Artificial Intelligence) International conference on artificial intelligence 2023 (SLAAI-ICAI 2023), 23-24 November 2023, Kelaniya, Sri Lanka. Piscataway: IEEE [online], article number 10365022. Available from: https://doi.org/10.1109...ICAI59257.2023.10365022

Presentation Conference Type	Conference Paper (published)
Conference Name	7th SLAAI (Sri Lanka Association for Artificial Intelligence) International conference on artificial intelligence 2023 (SLAAI-ICAI 2023)
Start Date	Nov 23, 2023
End Date	Nov 24, 2023
Acceptance Date	Sep 24, 2023
Online Publication Date	Dec 31, 2023
Publication Date	Dec 31, 2023
Deposit Date	Feb 1, 2024
Publicly Available Date	Feb 1, 2024
Publisher	Institute of Electrical and Electronics Engineers (IEEE)
Peer Reviewed	Peer Reviewed
DOI	https://doi.org/10.1109/SLAAI-ICAI59257.2023.10365022
Keywords	Concept-relations; Entity extraction; Layout analysis; Ontology; Transformers; Question answering
Public URL	https://rgu-repository.worktribe.com/output/2225955

Files

SILVA 2023 MicroConceptBERT (AAM) (518 Kb)
PDF

Copyright Statement
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Clinical dialogue transcription error correction using Seq2Seq models. (2022)
Presentation / Conference Contribution

Explainable weather forecasts through an LSTM-CBR twin system. (2023)
Presentation / Conference Contribution

Clinical dialogue transcription error correction with self-supervision. (2023)
Presentation / Conference Contribution

Clinical dialogue transcription error correction using Seq2Seq models. (2022)
Preprint / Working Paper

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

Files

You might also like

Downloadable Citations