Machine Learning and Medicine Lab

Research by Our Students

Bacterial Genes with Non-AUG Start Codons

In 2018, Anne Gvozdjak, a student from Bellevue High School, and Dr. Samanta started to look into the start codon patterns of bacterial genes. Their results are summarized in this article.

Abstract: Here we investigate translational regulation in bacteria by analyzing the distribution of start codons in fully assembled genomes. We report 36 genes (infC, rpoC, rnpA, etc.) showing a preference for non-AUG start codons in evolutionarily diverse phyla ("non-AUG genes"). Most of the non-AUG genes are functionally associated with translation, transcription or replication. In E. coli, the percentage of essential genes among these 36 is significantly higher than among all genes. Furthermore, the functional distribution of these genes suggests that non-AUG start codons may be used to reduce gene expression during starvation conditions, possibly through translational autoregulation or IF3-mediated regulation.

The untapped potential of the endophytic actinomyces, Streptomyces scabiei, in producing antibiotic and antitumor secondary metabolites

Defne Dingiloglu of Tesla STEM HS looked into antibiotic-producing gene clusters of Streptomyces scabiei under the supervision of Dr. Samanta. She earned a First Place Trophy for grades 9-12 at the 67th Annual Washington State Science & Engineering Fair (WSSEF) competition (2024).

Her results are summarized in this report.

In an increasingly antibiotic-resistant world, the discovery of new antibiotics is essential for public health. This project used computational methods to investigate the hypothesis that strain-level investigation of the endophytic actinomyces S. scabiei would find a greater diversity of Biosynthetic Gene Clusters (BGC) in its genome and the presence of antitumor products. Fourteen strains from this bacterium were analyzed for strain-specific BGCs, average BGC count (sum of total BGCs/number of strains) and RIbosomally Produced and Post-translationally modified (RiPP) BGC count. It was found that the studied S. scabiei strains contained up to 15 strain-specific gene clusters, a BGC count of 41.6, and an average of 2.4 RiPPs. Polyketide gene clusters were discovered to be the most prominent BGC class within S. scabiei's genome, on average making up 31.5% of a strain's BGC repertoire. Additionally, nine out of 14 strains contained the BGC capable of synthesizing the antitumor molecule Concanamycin A. However, it was found that a high strain-specific BGC count did not necessarily imply either a greater BGC diversity or the synthesis of a corresponding large number of unique bioactive compounds. The BGC count and/or BGC didsimilarity on the evolutionary tree along was/were not sufficient indicator(s) for the secondary metabolite synthesizing potential of a strain.

Pan-genome Analysis of Angiosperm Plastomes using PGR-TK

Our students have been looking into plant plastid genomes using the Pan-genome Research Toolkit (PGR-TK), a recently published tool for comparative analysis. Various reports on this analysis are shown below.

1. Pan-genome Analysis of Angiosperm Plastomes using PGR-TK

Manoj P. Samanta

Link
Abstract:We present a novel approach for taxonomic analysis of chloroplast genomes in angiosperms using the Pan-genome Research Toolkit (PGR-TK). Comparative plots generated by PGR-TK across diverse angiosperm genera reveal a wide range of structural complexity, from straightforward to highly intricate patterns. Notably, the characteristic quadripartite plastome structure, comprising the large single copy (LSC), small single copy (SSC), and inverted repeat (IR) regions, is clearly identifiable in over 75% of the genera analyzed. Our findings also underscore several occurrences of species mis-annotations in public genomic databases, which are readily detected through visual anomalies in the PGR-TK plots. While more complex plot patterns remain difficult to interpret, they likely reflect underlying biological variation or technical inconsistencies in genome assembly. Overall, this approach effectively integrates classical botanical visualization with modern molecular taxonomy, providing a powerful tool for genome-based classification in plant systematics.

2. Comparative Analysis of Plastid Genomes Using Pangenome Research ToolKit (PGR-TK)

Richa Jayanti, Andrew Kim, Sean Pham, Athreya Raghavan, Anish Sharma, Manoj P. Samanta

Link

Abstract: Plastid genomes (plastomes) of angiosperms are of great interest among biologists. High-throughput sequencing is making many such genomes accessible, increasing the need for tools to perform rapid comparative analysis. This exploratory analysis investigates whether the Pangenome Research Tool Kit (PGR-TK) is suitable for analyzing plastomes. After determining the optimal parameters for this tool on plastomes, we use it to compare sequences from each of the genera - Magnolia, Solanum, Fragaria and Cotoneaster, as well as a combined set from 20 rosid genera. PGR-TK recognizes large-scale plastome structures, such as the inverted repeats, among combined sequences from distant rosid families. If the plastid genomes are rotated to the same starting point, it also correctly groups different species from the same genus together in a generated cladogram. The visual approach of PGR-TK provides insights into genome evolution without requiring gene annotations.

3. Pan-genome Analysis of Plastomes from Lamiales using PGR-TK

Aadhavan Veerendra, Manoj Samanta

Link
Abstract: Chloroplast sequences from the Lamiales order were analyzed using the Pangenome Research Toolkit (PGR-TK). Overall, most genera and families exhibited a high degree of sequence uniformity. However, at the genus level, Utricularia, Incarvillea, and Orobanche stood out as particularly divergent. At the family level, Orobanchaceae, Bignoniaceae and Lentibulariaceae displayed notably complex patterns in the generated plots. The PGR-TK algorithm successfully distinguished most genera within their respective families and often recognized misclassified plants.

4. Pan-genome Analysis of Plastids from Ericales

Aditya Vinukonda and Manoj Samanta

Link
Abstract: Pangenome Research Toolkit (PGR-TK) was used to analyze the plastid sequences from the Ericales order. At the genus level, all members other than Vaccinium and Rhododendron showed standard quadripartite structures with uniform plastid lengths. Cyclamen persicum diverged significantly from the other members of its genus or family, and additional analysis showed it to be close to Psedostellaria (Caryophyllales). At the family level, phylogeny generated by PGR-TK correctly separated the genera. Other than Ericaceae, all family level plots displayed uniform quadripartite structures.

5. Pan-genome Analysis of Plastids from Caryophyllales

Anish Sharma and Manoj Samanta

Link
Abstract: A comparative analysis of plastid sequences from the Caryophyllales order is performed using the Pangenome Research Toolkit (PGR-TK). Overall, most genera and families showed high degrees of uniformity. At the genus level, Arenaria and Drosera diverged most from the general conserved pattern. Additionally, Dysphania, Caroxylon, Portulaca, Salsola, Stellaria and Suaeda had higher levels of divergence than most genera. At the family level, phylogeny generated by PGR-TK correctly separated the genera. Other than Cactaceae and Droseraceae, all family level plots were relatively uniform.