Skip to main content

Research Repository

Advanced Search

MicroConceptBERT: concept-relation based document information extraction framework.

Silva, Kanishka; Silva, Thushari; Nanayakkara, Gayani

Authors

Kanishka Silva

Thushari Silva



Abstract

Extracting information from documents is a crucial task in natural language processing research. Existing information extraction methodologies often focus on specific domains, such as medicine, education or finance, and are limited by language constraints. However, more comprehensive approaches that transcend document types, languages, contexts, and structures would significantly advance the field proposed in recent research. This study addresses this challenge by introducing microConceptBERT: a concept-relations-based framework for document information extraction, which offers flexibility for various document processing tasks while accounting for hierarchical, semantic, and heuristic features. The proposed framework has been applied to a question-answering task on benchmark datasets: SQUAD 2.0 and DOCVQA. Notably, the F1 evaluation metric attains an outperforming 87.01 performance rate on the SQUAD 2.0 dataset compared to baseline models: BERT-base and BERT-large models.

Citation

SILVA, K., SILVA, T. and NANAYAKKARA, G. 2023. MicroConceptBERT: concept-relation based document information extraction framework. In Proceedings of the 7th SLAAI (Sri Lanka Association for Artificial Intelligence) International conference on artificial intelligence 2023 (SLAAI-ICAI 2023), 23-24 November 2023, Kelaniya, Sri Lanka. Piscataway: IEEE [online], article number 10365022. Available from: https://doi.org/10.1109...ICAI59257.2023.10365022

Conference Name 7th SLAAI (Sri Lanka Association for Artificial Intelligence) International conference on artificial intelligence 2023 (SLAAI-ICAI 2023)
Conference Location Kelaniya, Sri Lanka
Start Date Nov 23, 2023
End Date Nov 24, 2023
Acceptance Date Sep 24, 2023
Online Publication Date Dec 31, 2023
Publication Date Dec 31, 2023
Deposit Date Feb 1, 2024
Publicly Available Date Feb 1, 2024
Publisher Institute of Electrical and Electronics Engineers (IEEE)
DOI https://doi.org/10.1109/SLAAI-ICAI59257.2023.10365022
Keywords Concept-relations; Entity extraction; Layout analysis; Ontology; Transformers; Question answering
Public URL https://rgu-repository.worktribe.com/output/2225955

Files

SILVA 2023 MicroConceptBERT (AAM) (518 Kb)
PDF

Copyright Statement
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.





You might also like



Downloadable Citations