Characterization of Microbial Primary and Secondary Metabolism in the Marine Realm
David Edward Geller-McGrath, Ph.D., 2024
Virginia Edgcomb, Advisor
This thesis applies meta-omics data analysis to elucidate the ecological roles of marine microorganisms in diverse habitats and includes the development of new bioinformatics tools to enhance these analyses. In my second chapter, I applied genome mining tools to analyze the gene content and expression of biosynthetic gene clusters (BGCs). The analysis of BGCs through large- scale genome mining efforts has identified diverse natural products with potential applications in medicine and biotechnology. Many marine environments, particularly oxygen-depleted water columns and sediments, however, remain under-represented in these studies. Analysis of BGCs in free-living and particle-associated microbial communities along the oxycline water column of the Cariaco Basin, Venezuela, revealed that differences in water column redox potential were associated with microbial lifestyle and the predicted composition and production of secondary metabolites. This experience set the stage for my third chapter, in which I developed MetaPathPredict, a machine learning-based tool for predicting the metabolic potential of bacterial genomes. This tool addresses the lack of computational pipelines for pathway reconstruction that predict the presence of KEGG modules in highly incomplete prokaryotic genomes. MetaPathPredict made robust predictions in highly incomplete bacterial genomes, enabling more accurate reconstruction of their metabolic potential. In my fourth chapter, I performed metagenomic analysis of microbial communities in the hydrothermally-influenced sediments of Guaymas Basin (Gulf of California, Mexico). Previous studies indicated a decline in microbial abundance and diversity with increasing sediment depth. Analysis revealed a distribution of MAGs dominated by Chloroflexota and Thermoproteota, with diversity decreasing as temperature increased, consistent with a downcore reduction in subsurface biosphere diversity. Specific archaeal MAGs within the Thermoproteota and Hadarchaeota increased in abundance and recruitment of metatranscriptome reads towards deeper, hotter sediments, marking a transition to a specialized deep biosphere. In my fifth chapter, I developed MetaPathPredict-E, a deep learning- powered extension of MetaPathPredict for eukaryotic metabolism predictions. Eukaryotic metabolism is diverse, reflecting varied lifestyles across eukaryotic kingdoms, but the complexity of eukaryotic genomes presents challenges for assembly and annotation. MetaPathPredict-E was trained on diverse eukaryotic genomes and transcriptomes, demonstrating a robust performance on test datasets, thus advancing the study of eukaryotic metabolic potential from environmental samples.