Thierry Mamer
A sequence-length sensitive approach to learning biological grammars using inductive logic programming.
Mamer, Thierry
Abstract
This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results.
Citation
MAMER, T. 2011. A sequence-length sensitive approach to learning biological grammars using inductive logic programming. Robert Gordon University, PhD thesis.
Thesis Type | Thesis |
---|---|
Deposit Date | Aug 19, 2011 |
Publicly Available Date | Aug 19, 2011 |
Public URL | http://hdl.handle.net/10059/662 |
Contract Date | Aug 19, 2011 |
Award Date | Jan 31, 2011 |
Files
MAMER 2011 Sequence-length sensitive approach
(1.1 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright Statement
© The Author.
You might also like
On the multi-objective optimization of wind farm cable layouts with regard to cost and robustness.
(2024)
Presentation / Conference Contribution
Mining potentially explanatory patterns via partial solutions.
(2024)
Presentation / Conference Contribution
A novel surrogate model for variable-length encoding and its application in optimising deep learning architecture.
(2024)
Presentation / Conference Contribution
Underwater object detection for smooth and autonomous operations of naval missions: a pilot Dataset.
(2024)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search