Data Collector
Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. [Dataset]
Contributors
Andrew Ramsay
Data Collector
Justin J.J. van der Hooft
Data Collector
Katherine R. Duncan
Data Collector
Sylvia Soldatou
Data Collector
Juho Rousu
Data Collector
Daly
Data Collector
Joe Wandy
Data Collector
Simon Rogers
Data Collector
Abstract
In this article, we introduce NPLinker, a software framework to link genomic and metabolomic data, to link microbial secondary metabolites to their producing genomic regions. Two of the major approaches for such linking are analysis of the correlation between sets of strains, and analysis of predicted features of the molecules. While these methods are usually used separately, we demonstrate that they are in fact complementary, and show a way to combine them to improve their performance. We begin by demonstrating a weakness in the most common method of strain correlation analysis, and suggest an improvement. We then introduce a new feature-based analysis method which, unlike most such methods, does not directly depend on the natural product compound class. Finally, we demonstrate that the two are complementary and proceed to combine them into a single scoring function for genomic and metabolomic links, which shows improved performance over either of the individual approaches. Verification is done using curated databases of genomic and metabolomic data, as well as public data sets of microbial data including validated links. To further validate the IOKR approach we investigated if it was possible, for high-scoring pairs of MS2 spectra and metabolites, to manually match relevant peaks in MS2 spectra to possible fragments of the metabolites. Full validation would require additional wet lab analysis, which is not possible with these publicly available datasets. If a link is genuine, it ought to be possible to match MS2 peaks in the spectra to substructures of the relevant chemical structures. If we can, it ought to be the case that these fragment peaks are particularly important in the IOKR model. We provide some examples to show that this is indeed the case. To illustrate this process, we took validated links in the Crusemann data set (see Section 2.8.2 and Table 4 in the published article https://doi.org/10.1371/journal.pcbi.1008920), as well as two high-scoring potential links chosen as their ranking had a strong contribution from the IOKR score.
Citation
ELDJÁRN, G.H., RAMSAY, A., VAN DER HOOFT, J.J.J., DUNCAN, K.R., SOLDATOU, S., ROUSU, J., DALY, J., WANDY, J. and ROGERS, S. 2021. Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. [Dataset]. PLOS computational biology [online], 17(5), e1008920. Available from: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008920#sec021
Acceptance Date | Mar 26, 2021 |
---|---|
Online Publication Date | May 4, 2021 |
Publication Date | May 31, 2021 |
Deposit Date | May 31, 2021 |
Publicly Available Date | May 31, 2021 |
Publisher | Public Library of Science |
DOI | https://doi.org/10.1371/journal.pcbi.1008920 |
Keywords | Ecology; Modelling and simulation; Computational theory and mathematics; Genetics; Ecology, evolution, behavior and systematics; Molecular biology; Cellular and molecular neuroscience |
Public URL | https://rgu-repository.worktribe.com/output/1347302 |
Publisher URL | https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008920#sec021 |
Related Public URLs | https://rgu-repository.worktribe.com/output/1347280 |
Type of Data | 5 PDF files, 4 XLXS files and supporting text (.txt) file. |
Collection Date | Apr 28, 2021 |
Collection Method | To match MS2 peaks to chemical substructures we made use of the MetFrag web interface [2]. For a given metabolite and spectrum, using the compound name search function within the NPAtlas database [3], we found the accurate mass for the metabolite. This was used as a search criterion on the neutral mass in the NPAtlas_Aug2019 database in MetFrag [2], to ensure that the relevant metabolite was in the candidate set. Because we wanted to match measured peaks in an actual MS2 spectrum to the predicted peaks for a particular metabolite, ideally, the MetFrag candidate set should have one member. Where more than one result was returned, only the result where the candidate metabolite name matched the given metabolite was used, except in the case of griseochelin, which was considered equivalent to zincophorin as it has been by others in literature [4]. The relevant spectral data was extracted from the Metabolomics Spectrum Resolver [5] and the MetFrag in-silico fragmentation algorithm (with default settings) was used. Peaks that did match were then checked to see how their exclusion from the MS2 spectrum in uenced the ranking of the metabolite, among the set of all metabolites, to that spectrum. The images for the spectra were generated by the Metabolomics Spectrum Resolver [5] while the images for the metabolites were genreated by MetFrag [5], with the identified substructure highlighted in green. |
Files
ELDJARN 2021 Ranking microbial (Data)
(3.8 Mb)
Archive
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Related Outputs
You might also like
Degradation of multiple peptides by microcystin-degrader Paucibacter toxinivorans (2C20).
(2021)
Journal Article