Kanishka Silva
MicroConceptBERT: concept-relation based document information extraction framework.
Silva, Kanishka; Silva, Thushari; Nanayakkara, Gayani
Abstract
Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models.
Citation
SILVA, K., SILVA, T. and NANAYAKKARA, G. 2023. MicroConceptBERT: concept-relation based document information extraction framework. In Proceedings of the 7th SLAAI (Sri Lanka Association for Artificial Intelligence) International conference on artificial intelligence 2023 (SLAAI-ICAI 2023), 23-24 November 2023, Kelaniya, Sri Lanka. Piscataway: IEEE [online], article number 10365022. Available from: https://doi.org/10.1109...ICAI59257.2023.10365022
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 7th SLAAI (Sri Lanka Association for Artificial Intelligence) International conference on artificial intelligence 2023 (SLAAI-ICAI 2023) |
Start Date | Nov 23, 2023 |
End Date | Nov 24, 2023 |
Acceptance Date | Sep 24, 2023 |
Online Publication Date | Dec 31, 2023 |
Publication Date | Dec 31, 2023 |
Deposit Date | Feb 1, 2024 |
Publicly Available Date | Feb 1, 2024 |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Peer Reviewed | Peer Reviewed |
DOI | https://doi.org/10.1109/SLAAI-ICAI59257.2023.10365022 |
Keywords | Concept-relations; Entity extraction; Layout analysis; Ontology; Transformers; Question answering |
Public URL | https://rgu-repository.worktribe.com/output/2225955 |
Files
SILVA 2023 MicroConceptBERT (AAM)
(518 Kb)
PDF
Copyright Statement
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
Clinical dialogue transcription error correction using Seq2Seq models.
(2022)
Presentation / Conference Contribution
Explainable weather forecasts through an LSTM-CBR twin system.
(2023)
Presentation / Conference Contribution
Clinical dialogue transcription error correction with self-supervision.
(2023)
Presentation / Conference Contribution
Clinical dialogue transcription error correction using Seq2Seq models.
(2022)
Preprint / Working Paper
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search