PreMiEr Publications


Controlling taxa abundance improves metatranscriptomics differential analysis

Ji, Z., Ma, L. Controlling taxa abundance improves metatranscriptomics differential analysis. BMC Microbiol 23, 60 (2023).


A common task in analyzing metatranscriptomics data is to identify microbial metabolic pathways with differential RNA abundances across multiple sample groups. With information from paired metagenomics data, some differential methods control for either DNA or taxa abundances to address their strong correlation with RNA abundance. However, it remains unknown if both factors need to be controlled for simultaneously.


We discovered that when either DNA or taxa abundance is controlled for, RNA abundance still has a strong partial correlation with the other factor. In both simulation studies and a real data analysis, we demonstrated that controlling for both DNA and taxa abundances leads to superior performance compared to only controlling for one factor.


To fully address the confounding effects in analyzing metatranscriptomics data, both DNA and taxa abundances need to be controlled for in the differential analysis.

Leveraging scheme for cross-study microbiome machine learning prediction and feature evaluations

Song, K.; Zhou, Y.-H. Leveraging Scheme for Cross-Study Microbiome Machine Learning Prediction and Feature Evaluations. Bioengineering 2023, 10, 231.

The microbiota has proved to be one of the critical factors for many diseases, and researchers have been using microbiome data for disease prediction. However, models trained on one independent microbiome study may not be easily applicable to other independent studies due to the high level of variability in microbiome data. In this study, we developed a method for improving the generalizability and interpretability of machine learning models for predicting three different diseases (colorectal cancer, Crohn’s disease, and immunotherapy response) using nine independent microbiome datasets. Our method involves combining a smaller dataset with a larger dataset, and we found that using at least 25% of the target samples in the source data resulted in improved model performance. We determined random forest as our top model and employed feature selection to identify common and important taxa for disease prediction across the different studies. Our results suggest that this leveraging scheme is a promising approach for improving the accuracy and interpretability of machine learning models for predicting diseases based on microbiome data.


C3NA: correlation and consensus-based cross-taxonomy network analysis for compositional microbial data

Song, K., Zhou, YH. C3NA: correlation and consensus-based cross-taxonomy network analysis for compositional microbial data. BMC Bioinformatics 23, 468 (2022).


Studying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa–taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of  these taxa-taxa relationships.  Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies.


In this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn’s disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses.


C3NA offers a new microbial data analyses pipeline for refined and enriched taxa–taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation.