Skip to main content

Interpretable machine learning decodes soil microbiome’s response to drought stress



Extreme weather events induced by climate change, particularly droughts, have detrimental consequences for crop yields and food security. Concurrently, these conditions provoke substantial changes in the soil bacterial microbiota and affect plant health. Early recognition of soil affected by drought enables farmers to implement appropriate agricultural management practices. In this context, interpretable machine learning holds immense potential for drought stress classification of soil based on marker taxa.


This study demonstrates that the 16S rRNA-based metagenomic approach of Differential Abundance Analysis methods and machine learning-based Shapley Additive Explanation values provide similar information. They exhibit their potential as complementary approaches for identifying marker taxa and investigating their enrichment or depletion under drought stress in grass lineages. Additionally, the Random Forest Classifier trained on a diverse range of relative abundance data from the soil bacterial micobiome of various plant species achieves a high accuracy of 92.3 % at the genus rank for drought stress prediction. It demonstrates its generalization capacity for the lineages tested.


In the detection of drought stress in soil bacterial microbiota, this study emphasizes the potential of an optimized and generalized location-based ML classifier. By identifying marker taxa, this approach holds promising implications for microbe-assisted plant breeding programs and contributes to the development of sustainable agriculture practices. These findings are crucial for preserving global food security in the face of climate change.


Global food security is significantly threatened by climate change, especially in regions with limited access to food resources [1,2,3]. Anticipated occurrences of extreme weather events, such as droughts, are likely to increase in frequency and intensity, causing significant crop damage and threatening food availability [4, 5]. These repercussions are attributed to shortened growing seasons and substantial reductions in crop yields due to various biotic and abiotic stresses, prominently drought stress [6,7,8].

External perturbations, like drought stress, significantly impact the dynamics of the soil microbial community, leading to compositional shifts that are facilitated by the recruitment of beneficial microbes from the surrounding soil to the roots [9, 10]. This interaction between plants and soil microorganisms is a vital aspect of ecosystem health and stability [11,12,13,14]. Hence, this results in the opportunity to identify and interpret specific metagenomic patterns, as they have the potential to provide valuable insights into the state of both soil and plant health.

The use of machine learning (ML) algorithms enables the analysis of complex microbiome data by fully capturing the depth of data and identifying patterns that can discriminate between different states or conditions [15]. This can help to identify specific marker taxa that are key to understanding the intricate relationships between environmental stressors, soil health, and plant viability. Thereby, they aid in the development of more effective soil management strategies, including irrigation practices and the selection of drought-tolerant crops. Marker taxa hold significant potential for application in Synthetic Communities (SynComs), providing an innovative approach for early intervention under challenging environmental conditions, like droughts, to enhance plant resilience and growth [10, 16].

Still, the path from ML predictions to actionable insights can be challenging. ML models often resemble black boxes, with their internal decision-making obscured from users. The interpretation of the reasons for certain predictions is essential, especially for complex biological data [17,18,19]. This is where interpretable ML methods such as SHapley Additive ExPlanation (SHAP) values are applied [20].

The concept behind SHAP values is to distribute the credit for the model’s prediction among the feature inputs based on their individual contribution, using game theory. Notably, SHAP values are model-specific yet globally constant, comprehensively taking into account interactions between features [21].

While SHAP values have been applied in clinical studies [22, 23], their use in metagenomic data is limited. For the identification of significant taxa between comparison groups, Differential Abundance Analysis (DAA) is the commonly used method in metagenomic analyses [24]. Typically used DAA tools are DESeq2 [25], ALDEx2 [26], edgeR [27], ANCOM-BC2 [28], and the non-parametric Wilcoxon rank-sum test.

DAA tools and SHAP values offer distinct approaches for detecting marker taxa. DAA tools rely on statistical tests and assume specific data distributions [29], while SHAP values are model-agnostic and applicable to any ML model [21]. Both methods aim to identify crucial features or taxa, providing insights into underlying biological mechanisms, but serve different purposes. DAA methods focus on identifying differentially abundant taxa, whereas interpretable ML using SHAP values offers importance measures based on model performance. The main objective of SHAP values is to interpret complex ML models by quantifying the contribution of each feature and explaining predictions. Their application in this study demonstrates the potential to identify key taxa in soil microbiomes as well as their role in the microbial response to drought stress.

The selection of an appropriate soil dataset was essential for this study. ML analyses thrive on datasets with many samples and informative metadata [30]. Finding the minimum number of samples needed for reliable predictions is a challenge with high-dimensional data, such as 16S rRNA-based metagenomic datasets with more features than samples [31, 32]. A dataset from the work of Naylor et al. [33] was selected as the largest available drought stress dataset. This dataset includes 623 samples from three soil isolation sources and investigates the effect of drought stress on 19 different crop species, including C3 and C4 plants. The number of features ranged from 26 to 330 depending on the taxonomic rank.

Despite the remarkable achievements in applying ML to human microbiome research [34,35,36,37,38], its application in the context of soil metagenomics is not yet as advanced. However, the agricultural industry is increasingly recognizing the potential of ML to improve soil health and promote sustainable farming practices [39, 40]. This includes predicting plant phenotypes based on the plant and surrounding soil microbiome to detect taxa associated with plant diseases and environmental stresses [41].

Prior research has highlighted the potential of ML in agriculture, with studies identifying marker taxa for crop productivity [42] and beneficial root microbes [43]. Interestingly, the potential of ML for drought stress identification in soil microbiomes remains largely unstudied, representing a promising area for investigation.

This research aims to determine the efficacy of ML in predicting drought stress within microbial data of drought-stressed soils. The study comprises three key objectives: a) investigating the predictive capability of ML for drought stress, b) comparing the performance of interpretable ML with conventional 16S rRNA-based metagenomic analyses, and c) assessing the generalization capabilities of the trained classifier. By identifying marker taxa and deciphering microbial patterns associated with drought stress, this research addresses sustainable agriculture, improved crop productivity, and increased food security.



A dataset originally curated by Naylor et al. [33] for their study on the impact of drought stress on the grass root microbiome was analyzed. This dataset, referred to as the ’Grass-Drought’ dataset, comprises 623 samples from three isolation sources, including ’Soil,’ ’Root,’ and ’Rhizosphere’, as well as two watering regimes, including ’Drought’ and ’Control’. Samples from 18 distinct grass species within the Poaceae clade are included in the dataset. Tomato was used as an outgroup. The experimental site was located in Albany, California, characterized by silty loam soil with a pH of 5.2. Both watering regimes, ’Drought’ and ’Control’, were balanced, with 320 samples in the ’Control’ group, receiving regular watering, and 303 samples in the ’Drought’ group, experiencing conditions without water supply. All samples were sequenced using 16S rRNA amplicon sequencing of the V3-V4 region and are available under the BioProjectID PRJNA369551.

To evaluate the ML model’s generalizability, its performance was assessed on a separate test dataset from Xu et al. [44] studying pre- and post-flowering drought stress effects on the Sorghum bicolor root microbiome (BioProjectID PRJNA435634), therefore referred to as the ’Sorghum-Drought’ dataset. The sampling site was located in Kearney, California. To ensure the comparability of drought conditions between the Sorghum-Drought dataset and the original Grass-Drought dataset, two subsets were created: The ’Progressive Drought’ subset comprised samples from the ’Control’ group, along with specific time points (weeks 2 to 7 and weeks 10 to 17) from the ’Pre-Flowering Drought’ and ’Post-Flowering Drought’ groups, respectively. This subset comprised 278 ’Control’ and 210 ’Drought’ samples. The ’Late Drought’ subset included samples from weeks 6, 7, 16, and 17 of the ’Control’ group, weeks 6 and 7 of the ’Pre-Flowering Drought’ group, and weeks 16 and 17 of the ’Post-Flowering Drought’ group, totaling 72 ’Control’ and 69 ’Drought’ samples. A detailed representation of the subsetting scheme can be found in the Additional file 1 in Fig. S1.

Data processing

The DADA2 workflow [45] for Illumina sequenced paired-end fastq files was employed for sequence data processing, implemented in R version 4.3.3. Taxonomy was assigned using the SILVA database [46] and the Ribosomal Database Project (RDP) classifier [47] from phylum to genus rank. To enhance data quality, prevalence filtering was conducted, retaining Amplicon Sequence Variants (ASVs) present in at least 95 % of all samples, reducing the total number of ASVs from 25,415 to 3,276. Samples with low read counts were excluded, yielding a dataset of 560 samples. Rarefaction was performed, normalizing sequencing depth to the dataset’s 10 % decile of 17,291 reads. Feature tables for ML for each taxonomic rank were constructed with relative abundance values per taxon across all samples and a ’Control’ or ’Drought’ target variable.

16S rRNA-based metagenomic analysis

A diversity analysis was conducted between the two watering regimes ’Control’ and ’Drought’. Alpha diversity was assessed using the Shannon index with the estimate_richness function from the phyloseq package (version 1.44.0) [48]. Beta diversity was explored via Principal Coordinate Analysis (PCoA) based on Bray-Curtis dissimilarities with the ordinate and plot_ordination functions from phyloseq.

To identify taxonomic differences between the ’Control’ and ’Drought’ groups, a DAA was employed with several tools using the microbiomeMarker R package (version 1.4.0) [49], including DESeq2, ALDEx2, and edgeR, as well as ANCOM-BC2 in the R package ANCOMBC (version 2.0.2) [50], and the non-parametric Wilcoxon rank-sum test on the ASV level. The UpSetR package (version 1.4.0) [51] was used to compare tool outcomes, and the three most suitable methods were applied to all taxonomic ranks to compare enrichment groups and Benjamini–Hochberg (BH) adjusted p-values [52].

Machine learning

A Random Forest Classifier (RFC) and a Logistic Regression Classifier (LRC) were both applied using the ScikitLearn Python package (version 1.1.3) [53] to all ranks to predict the samples’ watering treatment using relative taxon abundances. Hyperparameter optimization was carried out through five-fold nested Cross-Validation (CV), splitting the dataset into five equally sized parts. During each fold, four parts of the dataset were used for training, while the remaining part acted as a dataset for testing the best model of each fold. The mean model performance was evaluated in terms of accuracy, F1 score, precision, recall, and Area Under the Curve (AUC) between all folds. Due to the lower performance of the LRC, all further analyses were performed using the RFC.

In order to interpret the RFC predictions, SHAP values were utilized using the SHAP Python package (version 0.41.0) [54] with the shap.TreeExplainer function [20]. During each fold of the nested CV, feature contributions related to detecting drought stress from SHAP values were extracted. A consensus was sought across four or five of the folds, requiring alignment in the majority, to consider the enrichment information suitable for subsequent analysis. The feature contributions towards drought stress from the SHAP values were compared with taxon enrichment patterns from differential abundance testing. This was followed by a comparison of significant taxa identified by DAA methods and important taxa identified by ML.

The model performance and generalizability were tested on two independent subsets of the Sorghum-Drought dataset of Xu et al. [44] that has been described in detail in the ’Datasets’ section. This test dataset was processed similarly to the Grass-Drought dataset. To create feature tables, the tables were pruned to only include taxa present in the Grass-Drought dataset. Taxa that were not present in the Sorghum-Drought dataset but in the Grass-Drought dataset were added with zero counts as demonstrated in the Additional file 1 in Table S1. Model performance was evaluated, including mean accuracy, F1 score, precision, recall, and AUC, which were computed across all taxonomic ranks for both subsets.

Additionally, the robustness of the ML model was tested on five randomly subsampled independent hold-out datasets of the Grass-Drought dataset and the average of the mean model’s performance was evaluated.


Alpha and beta diversity

Alpha diversity analysis, utilizing the Shannon index as displayed in Fig. 1 A, found no significant differences between the ’Control’ and ’Drought’ groups. This demonstrates that microbial diversity within individual samples was not significantly impacted by watering regimes. On all taxonomic ranks, no highly abundant taxon was found to be uniquely abundant to drought stress, as only differences between relative abundances between ’Control’ and ’Drought’ groups could be observed as shown in Additional file 1 in Fig. S2. Beta diversity, assessed via PCoA based on Bray-Curtis dissimilarities as shown in Fig. 1 B, yielded insights into the variation between the samples. The watering regime accounted for 6.8 % of the variance and could be clustered into the corresponding irrigation groups. In order to train the ML model to detect drought stress from a variety of soil samples deriving from different isolation sources and crops, the whole dataset was used without subsetting it to specific sample types. For further 16S rRNA-based metagenomic analyses with this dataset and its metadata, Naylor et al.’s paper itself [33] is referred to, offering interesting insights into the influence of the soil isolation source and the impact of the different crops on the root microbiome.

Fig. 1
figure 1

Diversity Plots for the Grass-Drought Dataset. Alpha and beta diversity plots comparing the ’Control’ (blue) and ’Drought’ (red) watering regimes. A Boxplots of Shannon’s Diversity Index for all samples comparing watering regimes. Significance was determined using a non-parametric Wilcoxon rank sum test (* p <0.05, ** p <0.01, *** p <0.001, **** p <0.0001). B Principal Coordinate plot using Bray-Curtis dissimilarities colored by the watering regimes

Comparative analysis of DAA tools

This study’s comprehensive approach to DAA encompassed five distinct methods: DESeq2, ANCOM-BC2, ALDEx2, edgeR, and the non-parametric Wilcoxon rank-sum test (Fig. 2). All methods used False Discovery Rate (FDR)-corrected p-values with BH correction and an alpha threshold <0.05. A total of 2,356 ASVs were identified as significantly differentially abundant. Strikingly, 441 ASVs were identified by all five methods, highlighting a core set of differentially abundant taxa. EdgeR and the non-parametric Wilcoxon rank-sum test identified 485 and 318 unique ASVs, respectively. On the other hand, ANCOM-BC2, ALDEx2, and DESeq2 showed consistent results with no or only a small number of uniquely identified ASVs and were therefore used for DAA on all taxonomic ranks. An increased level of consistency between the three tools was visible as displayed in Additional file 1 in Fig. S3.

Fig. 2
figure 2

ASV Intersections between different DAA Tools on ASV level of the Grass-Drought Dataset. Upset plots displaying the overlap and uniqueness of significant taxa on ASV level identified by DAA methods (ALDEx2, DESeq2, ANCOM-BC2, non-parametric Wilcoxon rank-sum test, and edgeR). The horizontal bars show the total number of taxa for each tool, while the vertical bars show the number of shared taxa between corresponding sets, sorted by the total number of shared taxa. All tools use an alpha threshold of 0.05 for significance

The RFC shows remarkable classifying performance across all ranks

Machine learning using the trained RFC demonstrated remarkable performance scores in predicting drought stress in the soil metagenome. Table 1 shows, that across all taxonomic ranks, the RFC consistently delivered exceptional results, with a mean accuracy surpassing 90 %.

The genus level proved to be the most effective input, achieving an accuracy of 0.923 ± 0.029, an F1 score of 0.921 ± 0.030, and a recall of 0.954 ± 0.029. Family-level analysis excelled in precision, with a score of 0.902 ± 0.038. Furthermore, the AUC underscored the robust performance of the RFC, with a mean AUC of 0.980 ± 0.010 at the genus level. The corresponding Reciever Operating Characteristic (ROC) curves can be found in Additional file 1 in Fig. S4. The results of the LRC exceeded slightly lower performance on all taxonomic ranks, as displayed in the Additional file 1 in Table S2.

Table 1 Random forest classifier performance of the Grass-Drought dataset

Interpretable ML and DAA as complementary approaches in marker taxa identification

DAA tools (ANCOM-BC2, ALDEx2, DESeq2), and SHAP values emerged as powerful methodologies for investigating taxon enrichment and feature importance, respectively. Both approaches, as displayed in Fig. 3, consistently identified taxa responsible for driving differences between the ’Control’ and ’Drought’ groups, which can be considered marker taxa for drought stress. Overall, the proportion of matches in the enrichment assignments of DAA tools and SHAP values for all identified taxa ranged from 79.59 % on the order level to 82.65 % on the genus level.

The significance and importance of microbial taxa in the dataset were explored using both DAA significances and SHAP value feature importances of the RFC. Comparing the sorted top 25 genera with adjusted significance levels from ANCOM-BC2 and their corresponding mean absolute SHAP values, it was found that the most significant and also important taxon with more than a two-fold difference to the next taxon was the genus Kribbella (Fig. 3).

Fig. 3
figure 3

Genera Enrichment, Significance and Importance by DAA Tools and SHAP Values of the Grass-Drought Dataset. Binary heatmap showing the enrichment of the top 25 significant genera from ANCOM-BC2 between ’Control’ (blue) and ’Drought’ (red) groups for the three methods used for DAA (DESeq2, ANCOM-BC2, ALDEx2) with an alpha <0.05, and SHAP values obtained from the RFC. Corresponding bar plots comparing -\(\log _{10}\)(p_adjust) values (orange) and \(\text {mean}(|\text {SHAP value}|)\) (green)

In a direct comparison with the feature importances, it was shown that the order of the taxa identified as important differed greatly from that of ANCOM-BC2 in some genera. This effect was also visible at higher taxonomic ranks (Additional file 1: Figs. S5 and S6). Especially the genera Streptomyces and Occallatibacter did not stand out from the results of ANCOM-BC2 but were prioritized considerably higher by their SHAP values.

The trained RFC generalizes to unseen samples from a different dataset

To assess the generalizability of the trained RFC model, it was applied to a test dataset from Xu et al. (2018), focusing on Sorghum bicolor root microbiomes subjected to drought stress. First, the trained RFC underwent testing using samples exhibiting advanced drought stress conditions, referred to as the ’Late Drought’ subset. These samples were expected to demonstrate the most noticeable changes in the relative abundances of the taxa. Stable accuracies across all taxonomic ranks could be detected, as displayed in Table 2, with a notable improvement towards the family level. The family level achieved the highest accuracy (0.854 ± 0.017), while also excelling in F1 score (0.855 ± 0.021), precision (0.830 ± 0.029), and AUC (0.912 ± 0.012). The order level exhibited the best recall (0.925 ± 0.031).

Table 2 Late drought classifier performance of the Sorghum-Drought dataset

Due to the classifier’s outstanding performance with the ’Late Drought’ subset, testing extended to another subset containing various drought stress levels, referred to as the ’Progressive Drought’ subset. This subset contained samples of the complete course of the drought period with associated controls. Here, the order level displayed the best F1 score (0.754 ± 0.024) and recall (0.887 ± 0.018), while the family level yielded the highest accuracy (0.768 ± 0.018), precision (0.692 ± 0.014), and AUC (0.814 ± 0.011) as shown in Table 3. For both subsets, it was noticeable that the best performance was not observed at the genus level, but at the family or order level.

Table 3 Progressive drought classifier performance of the Sorghum-Drought dataset

The performance of the ML models on five independent hold-out datasets of the Grass-Drought dataset is listed in Additional file 1 in Tables S3 and S4, illustrating the robustness of the classifier. The RFC, trained on the Grass-Drought dataset using nested CV with hyperparameter tuning, from which 20 % of the samples were randomly set aside before all preprocessing steps, exhibits very similar performance metrics to the model trained on all samples (Table S3). The performance of the independent hold-out datasets imply a high degree of similarity and demonstrate a high prediction accuracy across all ranks (Table S4).


This study employed ML to predict the irrigation state of soil samples based on their microbial community composition and aimed to discover specific marker taxa for drought stress. In terms of alpha and beta diversity analysis, it is crucial to note that drought stress primarily influenced the relative abundance of taxa rather than causing a complete abolishment or appearance of certain taxa [55], which aligns with the study’s focus on the ’Control’ and ’Drought’ labels. This suggests that the classifier was trained by emphasizing variations in taxon abundance rather than focusing on the presence or absence of specific taxa. Although the watering regime explains only 6.8 % of the total variance, the PCoA indicated distinct patterns in the microbial community composition, suggesting the data’s suitability for subsequent ML analysis.

Regarding the comparison of interpretable ML with conventional DAA tools for marker taxa identification, the focus was on finding the most suitable DAA tools for the Grass-Drought dataset from a variety of popular methods, namely ALDEx2, ANCOM-BC2, DESeq2, edgeR, and the non-parametric Wilcoxon rank-sum test. This approach is generally recommended for DAA, as it is not possible to find the true number of significant taxa in real-world data sets like it is the case with mock data [56, 57]. The used DAA methods made different assumptions about the data distribution [56]. For instance, DESeq2 and edgeR assume a negative binomial distribution, while ALDEx2 and ANCOM-BC2 assume a Gaussian distribution. The non-parametric Wilcoxon rank-sum test, on the other hand, does not make any distribution assumptions.

On the ASV level, out of 3,276 total assigned ASVs, 71.9 % were identified as significant by at least one of the five DAA methods. However, only 13.46 % of these significant ASVs were detected by all five methods, suggesting a substantial proportion of ASVs being uniquely identified by specific tools, possibly indicating false discoveries [56]. Specifically, edgeR and the Wilcoxon rank-sum test uniquely identified a high number of ASVs not detected by other tools, which can be an indicator for many false discoveries and unreliable results [58].

In contrast, ANCOM-BC2, DESeq2, and ALDEx2 exhibited more reliable results, with fewer uniquely identified ASVs. These findings are in line with previous studies that have highlighted the reliability of these three methods in controlling the FDR [56, 57, 59, 60]. Out of these three tools, ANCOM-BC2 was selected as the DAA method to compare its results directly with those of interpretable ML, as it showed the most overlap in detected ASVs with the other two selected DAA tools.

For interpretable ML, a RFC was chosen as it is recognized as a top-performing classifier for handling high-dimensional and sparse data, such as metagenomic datasets with hundreds to thousands of features and non-linear relationships between features and the target variable [61,62,63,64,65]. In comparison with Logistic Regression, the RFC yielded a slightly better performance with the binary classification problem of drought stress prediction (Table S2).

The RFC, trained on a dataset containing soil samples from a variety of soil isolation sources, crops, and drought stress levels, exhibited exceptional performance across all taxonomic ranks, directly addressing objective a) of this study. As the taxonomic rank descended from higher (e.g., phylum) to lower levels (e.g., genus), the granularity and resolution of the features increased. At the genus level, which represented the lowest taxonomic rank, the classifier exhibited the highest mean accuracy of 92.3 %, as well as the highest F1 score, recall, and AUC, demonstrating its effectiveness in capturing true positive instances. While there was a slight increase in overall accuracy from the phylum to the genus level, the genus level provided more specific insights into microbial diversity and potential marker taxa for drought stress.

In the exploration of marker taxa for drought stress in the soil metagenome, interpretable ML was employed using SHAP values in the nested CV of each taxonomic rank. Enrichment and feature importance results were compared with the output of the three most suitable DAA tools for this dataset. At higher taxonomic ranks, the agreement between DAA enrichment and SHAP value contribution was less definitive. Among the DAA tools, taxa were often not classified as significant by all three tools, as seen with Bacteroidota at the phylum level. Similarly, SHAP values did not always provide clear results between the loops of the nested CV, making it challenging to assign enrichment to either ’Control’ or ’Drought’, as observed with Firmicutes at the phylum level. In some cases, the results from DAA and interpretable ML differed, such as with Verrucomicrobiae and Armanimonadota at the phylum level, which were classified as enriched in ’Control’ by DAA tools and enriched under ’Drought’ by SHAP. According to the literature, both phyla were found to be more enriched under irrigation, but the same study concluded that both taxa have the potential to assist plants under drought conditions [66]. However, at lower, more specific ranks such as family and genus levels, all enrichment information among the top 25 taxa was consistent. This consistency highlights that SHAP values can be equally useful for the discovery of specific marker taxa under stress conditions, effectively fulfilling objective b).

Furthermore, the rankings of taxa between DAA and ML approaches were compared. While the order of significant taxa differed, the genus Kribbella consistently emerged as most significant and important, displaying a two-fold increase compared to the next relevant genus. Although being a poorly studied genus, Kribbella has shown potential in promoting plant growth [67, 68], making it a promising marker taxon for drought stress.

Additionally, in the direct comparison of significances and feature importances, certain taxa were detected with greater prominence, as evidenced by a peak in their mean absolute SHAP value compared to the significance assigned by the DAA analysis, like the genera Streptomyces and Occallatibacter. Streptomyces, a dominant genus in soil microbiomes, has been associated with drought stress and plant health in dry environments [69,70,71]. The genus Occallatibacter showed depletion under drought conditions and was considered an important feature for the prediction of drought stress, although further research is needed to understand its specific impact on soil metagenomes under drought, as it has only been observed under heat stress and no-stress conditions [72, 73].

SHAP values and DAA tools use different underlying approaches for the identification of important or significant taxa. In the context of this study, it is not possible to determine which approach is more suitable, but the overall results suggest that both methods provide important information for the identification of marker taxa. Therefore, these approaches should be seen as complementary rather than interchangeable, with each providing valuable insights into metagenomic data analysis.

To evaluate the generalization capabilities of this study’s classifier, its performance was tested on another drought stress dataset. The classifier’s performance was assessed with samples undergoing several weeks of drought stress as the most impactful differences were expected between the two watering regimes. The Late Drought subset exhibited an accuracy score of 0.854 ± 0.017 at the family level. Therefore, the classifier’s effectiveness and robustness across the entire spectrum of drought stress levels of the Sorghum-Drought dataset was explored by predicting drought stress in the Progressive Drought subset. Remarkably, the results consistently demonstrated the classifier’s outstanding performance in both scenarios. The Progressive Drought subset achieved an accuracy score of 0.768 ± 0.018 at the family level, indicating the model’s reliability in classifying drought stress regardless of the drought stress level involved. In contrast to the Grass-Drought dataset, where the best performance was achieved at the genus level as the lowest taxonomic rank with the highest granularity, the subsets of the Sorghum-Drought test dataset did not yield the best classification results on this rank. The best performance was observed on the order and family levels. This can be attributed to the inherent sparsity on the genus level due to the addition of taxa with zero counts to create feature tables for the prediction with equal feature inputs, as displayed in the Additional file 1 in Table S1.

These results emphasize the classifier’s adaptability across diverse drought stress conditions, reinforcing its utility as a valuable tool for drought stress classification, in line with the objectives outlined in objective c) of this study. Even though the classifier was trained with a dataset containing 16S rRNA metagenomic data of different drought stress levels, soil isolation sources, and a variety of plants, the approach might vary based on input data from other sequencing regions or plants that the classifier was not trained on. Such differences may emerge due to differences in the estimation of microbial diversity [74, 75]. The effect of the soil isolation source on the prediction also offers scope for further investigations to improve the classifier’s predictive capabilities and potentially develop a classifier tailored to specific soil isolation sources [76]. Expanding the sample size could further enhance the classifier’s generalizability, as a more extensive representation of taxa would increase the likelihood of encountering taxa abundances present in unknown samples during new predictions. However, when dealing with large datasets, selecting a comprehensive range of core taxa is recommended. By selecting the most important taxa as features [77] and training the classifier accordingly, the introduction of sparsity in the feature tables when predicting new data can be prevented. Furthermore, including samples from different locations introduces another dimension, as variations in microbial composition across diverse geographic locations and climates [78] can impact the classifier’s performance. With the upcoming possibilities of 16S rRNA long-read sequencing, it is recommended to create a location- and long-read sequencing-based classifier to generate an individual classifier with a reduced bias. By using an ensemble learning approach through the integration of multiple ML models the overall ML performance might enhance even more. This strategy not only boosts accuracy but also mitigates overfitting, ensuring models to generalize better on unseen data [79, 80]. Given the complex nature of RFCs, employing an ensemble learning approach comprising multiple, less complex learners could present an intriguing approach for exploration. Further evaluation with more data and subsequent feature selection seem interesting applications for future research.


In conclusion, with the ongoing threat of extreme weather events, notably droughts [81, 82], it is indispensable to explore innovative methods for understanding the impact of the soil microbiome on agriculture and ecosystems. The primary accomplishment of this study is the creation of a location-based classifier for drought stress in the soil metagenome. Demonstrating remarkable generalization capabilities, the classifier assesses drought stress across various drought stress levels and is applicable to different grasses.

The application of this study’s generalized ML model extends beyond the classification of drought stress, facilitating precision agriculture, including the optimization of irrigation strategies. Further, in microbe-assisted plant breeding programs, the discovery of marker taxa for drought stress using interpretable ML with SHAP values provides farmers and breeders with valuable insights for the definition of microbial strains for targeted bioinoculation approaches. Here, a deeper understanding of the plant growth-promoting functions associated with drought stress-related taxa holds promise for future advancements. This knowledge could play a pivotal role in enhancing plant adaptation to drought stress, strengthening the plant immune system against yield losses, and reducing susceptibility to pathogens [83].

Data Availibility

The datasets supporting the conclusions of this article are available in the NCBI Short Read Archive under the BioProjectID PRJNA369551 (’Grass-Drought’ dataset), and PRJNA435634 (’Sorghum-Drought’ dataset). All analysis scripts are available on GitHub (



Amplicon sequence variants


Area under the curve






Differential abundance analysis


False discovery rate


Logistic regression classifier


Machine learning


Principal coordinate analysis


Ribosomal database project


Random forest classifier


Reciever operating characteristic


SHapley additive explanation


Synthetic communities


  1. Wheeler T, von Braun J. Climate change impacts on global food security. Science. 2013;341(6145):508–13.

    Article  CAS  PubMed  Google Scholar 

  2. Schmidhuber J, Tubiello FN. Global food security under climate change. Proc Natl Acad Sci. 2007;104(50):19703–8.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Myers S, Fanzo J, Wiebe K, Huybers P, Smith M. Current guidance underestimates risk of global environmental change to food security. The BMJ. 2022;378: e071533.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Trenberth KE, Dai A, van der Schrier G, Jones PD, Barichivich J, Briffa KR, et al. Global warming and changes in drought. Nat Clim Chang. 2014;4(1):17–22.

    Article  Google Scholar 

  5. Kempf M. Enhanced trends in spectral greening and climate anomalies across Europe. Environ Monit Assess. 2023;195(2):260.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Raza A, Razzaq A, Mehmood SS, Zou X, Zhang X, Lv Y, et al. Impact of climate change on crops adaptation and strategies to tackle its outcome: a review. Plants. 2019;8(2):34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Moriondo M, Giannakopoulos C, Bindi M. Climate change impact assessment: the role of climate extremes in crop yield simulation. Clim Change. 2011;104(3):679–701.

    Article  Google Scholar 

  8. Battisti DS, Naylor RL. Historical warnings of future food insecurity with unprecedented seasonal heat. Science. 2009;323(5911):240–4.

    Article  CAS  PubMed  Google Scholar 

  9. Ossowicki A, Raaijmakers JM, Garbeva P. Disentangling soil microbiome functions by perturbation. Environ Microbiol Rep. 2021;13(5):582–90.

    Article  PubMed  Google Scholar 

  10. Ali S, Tyagi A, Park S, Mir RA, Mushtaq M, Bhat B, et al. Deciphering the plant microbiome to improve drought tolerance: mechanisms and perspectives. Environ Exp Bot. 2022;201: 104933.

    Article  CAS  Google Scholar 

  11. Berendsen RL, Pieterse CMJ, Bakker PAHM. The rhizosphere microbiome and plant health. Trends Plant Sci. 2012;17(8):478–86.

    Article  CAS  PubMed  Google Scholar 

  12. Xiong W, Song Y, Yang K, Gu Y, Wei Z, Kowalchuk GA, et al. Rhizosphere protists are key determinants of plant health. Microbiome. 2020;8(1):27.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Gao M, Xiong C, Gao C, Tsui CKM, Wang MM, Zhou X, et al. Disease-induced changes in plant microbiome assembly and functional adaptation. Microbiome. 2021;9(1):187.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Xie J, Dawwam GE, Sehim AE, Li X, Wu J, Chen S, et al. Drought stress triggers shifts in the root microbial community and alters functional categories in the microbial gene pool. Front Microbiol. 2021;12: 744897.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Kumar R, Yadav G, Kuddus M, Ashraf GM, Singh R. Unlocking the microbial studies through computational approaches: how far have we reached? Environ Sci Pollut Res. 2023;30(17):48929–47.

    Article  Google Scholar 

  16. Miller T, Mikiciuk G, Kisiel A, Mikiciuk M, Paliwoda D, Sas-Paszt L, et al. Machine learning approaches for forecasting the best microbial strains to alleviate drought impact in agriculture. Agriculture. 2023;13(8):1622.

    Article  Google Scholar 

  17. Watson DS. Interpretable machine learning for genomics. Hum Genet. 2022;141(9):1499–513.

    Article  CAS  PubMed  Google Scholar 

  18. Bifarin OO. Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification. PLoS ONE. 2023;18(5): e0284315.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Conard AM, DenAdel A, Crawford L. A spectrum of explainable and interpretable machine learning approaches for genomic studies. WIREs Comput Stat. 2023;15(5): e1617.

    Article  Google Scholar 

  20. Lundberg SM, Erion GG, Lee SI. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv preprint arXiv:1802.03888. 2019;

  21. Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. Advances in neural information processing systems. 2017;30.

  22. Tan CCS, Acman M, van Dorp L, Balloux F. Metagenomic evidence for a polymicrobial signature of sepsis. Microbial Genomics. 2021;7(9): 000642.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Rynazal R, Fujisawa K, Shiroma H, Salim F, Mizutani S, Shiba S, et al. Leveraging explainable AI for gut microbiome-based colorectal cancer classification. Genome Biol. 2023;24(1):21.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Cappellato M, Baruzzo G, Camillo BD. Investigating differential abundance methods in microbiome data: a benchmark study. PLoS Comput Biol. 2022;18(9): e1010467.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq. PLoS ONE. 2013;8(7): e67019.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.

    Article  CAS  PubMed  Google Scholar 

  28. Kaul A, Mandal S, Davidov O, Peddada SD. Analysis of microbiome data in the presence of excess zeros. Front Microbiol. 2017;8: 283205.

    Article  Google Scholar 

  29. Yang L, Chen J. A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions. Microbiome. 2022;10(1):130.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics. 2023;24(1):48.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Papoutsoglou G, Tarazona S, Lopes MB, Klammsteiner T, Ibrahimi E, Eckenberger J, et al. Machine learning approaches in microbiome research: challenges and best practices. Front Microbiol. 2023;14:1261889.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER. Optimal number of features as a function of sample size for various classification rules. Bioinformatics. 2005;21(8):1509–15.

    Article  CAS  PubMed  Google Scholar 

  33. Naylor D, DeGraaf S, Purdom E, Coleman-Derr D. Drought and host selection influence bacterial community dynamics in the grass root microbiome. ISME J. 2017;11(12):2691–704.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Oh TG, Kim SM, Caussy C, Fu T, Guo J, Bassirian S, et al. A universal gut-microbiome-derived signature predicts cirrhosis. Cell Metab. 2020;32(5):878-888.e6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Zhou YH, Gallins P. A review and tutorial of machine learning methods for microbiome host trait prediction. Front Genet. 2019;10:579.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Barnard E, Shi B, Kang D, Craft N, Li H. The balance of metagenomic elements shapes the skin microbiome in acne and health. Sci Rep. 2016;6(1):39491.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Lee SJ, Rho M. Multimodal deep learning applied to classify healthy and disease states of human microbiome. Sci Rep. 2022;12(1):824.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Robertson R, Church J, Edens T, Mutasa K, Geum HM, Baharmand I, et al. The fecal microbiome and rotavirus vaccine immunogenicity in rural Zimbabwean infants. Vaccine. 2021.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Meshram V, Patil K, Meshram V, Hanchate D, Ramkteke SD. Machine learning in agriculture domain: a state-of-art survey. Artif Intell Life Sci. 2021;1: 100010.

    Article  CAS  Google Scholar 

  40. Dhaliwal DS, Williams MM. Sweet corn yield prediction using machine learning models and field-level data. Precision Agric. 2023.

    Article  Google Scholar 

  41. Deng Z, Zhang J, Li J, Zhang X. Application of deep learning in plant–microbiota association analysis. Front Genet. 2021;12: 697090.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Chang HX, Haudenshield JS, Bowen CR, Hartman GL. Metagenome-wide association study and machine learning prediction of bulk soil microbiome and crop productivity. Front Microbiol. 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Jin T, Wang Y, Huang Y, Xu J, Zhang P, Wang N, et al. Taxonomic structure and functional association of foxtail millet root microbiome. GigaScience. 2017;6(10):1–12.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Xu L, Naylor D, Dong Z, Simmons T, Pierroz G, Hixson KK, et al. Drought delays development of the sorghum root microbiome and enriches for monoderm bacteria. Proc Natl Acad Sci. 2018;115(18):E4284–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13(7):581–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012;41(D1):D590–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Wang Q, Garrity GM, Tiedje JM, Cole JR. Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 2007;73(16):5261–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8(4): e61217.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Cao Y, Dong Q, Wang D, Zhang P, Liu Y, Niu C. microbiomeMarker: an R/Bioconductor package for microbiome marker identification and visualization. Bioinformatics. 2022;38(16):4027–9.

    Article  CAS  PubMed  Google Scholar 

  50. Lin H, Peddada SD. Multigroup analysis of compositions of microbiomes with covariate adjustments and repeated measures. Nat Methods. 2024;21(1):83–91.

    Article  CAS  PubMed  Google Scholar 

  51. Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: visualization of intersecting sets. IEEE Trans Visual Comput Graphics. 2014;20(12):1983–92.

    Article  Google Scholar 

  52. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B (Methodol). 1995;57(1):289–300.

    Article  Google Scholar 

  53. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  54. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56–67.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Naylor D, Coleman-Derr D. Drought stress and root-associated bacterial communities. Front Plant Sci. 2018;8: 303756.

    Article  Google Scholar 

  56. Nearing JT, Douglas GM, Hayes MG, MacDonald J, Desai DK, Allward N, et al. Microbiome differential abundance methods produce different results across 38 datasets. Nat Commun. 2022;13(1):342.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Wallen ZD. Comparison study of differential abundance testing methods using two large Parkinson disease gut microbiome datasets derived from 16S amplicon sequencing. BMC Bioinformatics. 2021;22(1):265.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Thorsen J, Brejnrod A, Mortensen M, Rasmussen MA, Stokholm J, Al-Soud WA, et al. Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies. Microbiome. 2016;4(1):62.

    Article  PubMed  PubMed Central  Google Scholar 

  59. Quinn TP, Crowley TM, Richardson MF. Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods. BMC Bioinf. 2018;19:274.

    Article  CAS  Google Scholar 

  60. Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020;11(1):3514.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res. 2010;11(70):2079–107.

    Google Scholar 

  62. Capitaine L, Genuer R, Thiébaut R. Random forests for high-dimensional longitudinal data. Stat Methods Med Res. 2021;30(1):166–84.

    Article  PubMed  Google Scholar 

  63. Smith MR, Martinez T, Giraud-Carrier C. The potential benefits of data set filtering and learning algorithm hyperparameter optimization. In: Proceedings of the 2015 International Conference on Meta-Learning and Algorithm Selection - Volume 1455. MetaSel’15. Aachen, DEU:; 2015. p. 3–14.

  64. Gao Y, Zhu Z, Sun F. Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data. Synthetic Syst Biotechnol. 2022;7(1):574–85.

    Article  CAS  Google Scholar 

  65. Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics. 2012;99(6):323–9.

    Article  CAS  PubMed  Google Scholar 

  66. Jang SW, Yoou MH, Hong WJ, Kim YJ, Lee EJ, Jung KH. Re-Analysis of 16S Amplicon Sequencing Data Reveals Soil Microbial Population Shifts in Rice Fields under Drought Condition. Rice. 2020;13(1):44.

    Article  PubMed  PubMed Central  Google Scholar 

  67. Mehmood MA, Fu Y, Zhao H, Cheng J, Xie J, Jiang D. Enrichment of bacteria involved in the nitrogen cycle and plant growth promotion in soil by sclerotia of rice sheath blight fungus. Stress Biol. 2022;2(1):32.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Siebielec S, Siebielec G, Klimkowicz-Pawlas A, Galazka A, Grzadziel J, Stuczyński T. Impact of water stress on microbial community and activity in sandy and loamy soils. Agronomy. 2020;10(9):1429.

    Article  CAS  Google Scholar 

  69. Barka EA, Vatsa P, Sanchez L, Gaveau-Vaillant N, Jacquard C, Klenk HP, et al. Taxonomy, physiology, and natural products of actinobacteria. Microbiol Mol Biol Rev. 2015;80(1):1–43.

    Article  PubMed  PubMed Central  Google Scholar 

  70. Tóth Z, Táncsics A, Kriszt B, Kröel-Dulay G, Ónodi G, Hornung E. Extreme effects of drought on composition of the soil bacterial community and decomposition of plant tissue. Eur J Soil Sci. 2017;68(4):504–13.

    Article  CAS  Google Scholar 

  71. Abbasi S, Sadeghi A, Safaie N. Streptomyces alleviate drought stress in tomato plants and modulate the expression of transcription factors ERF1 and WRKY70 genes. Sci Hortic. 2020;265: 109206.

    Article  CAS  Google Scholar 

  72. Liu L, Lin W, Zhang L, Tang X, Liu Y, Lan S, et al. Changes and correlation between physiological characteristics of rhododendron simsii and soil microbial communities under heat stress. Front Plant Sci. 2022;13: 950947.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Faist H, Trognitz F, Antonielli L, Symanczik S, White PJ, Sessitsch A. Potato root-associated microbiomes adapt to combined water and nutrient limitation and have a plant genotype-specific role for plant stress mitigation. Environmental Microbiome. 2023;18(1):18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Zhou J, Wu L, Deng Y, Zhi X, Jiang YH, Tu Q, et al. Reproducibility and quantitation of amplicon sequencing-based detection. ISME J. 2011;5(8):1303–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Abellan-Schneyder I, Matchado MS, Reitmeier S, Sommer A, Sewald Z, Baumbach J, et al. Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing. mSphere. 2021;6(1):e01202-20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Fierer N. Embracing the unknown: disentangling the complexities of the soil microbiome. Nat Rev Microbiol. 2017;15(10):579–90.

    Article  CAS  PubMed  Google Scholar 

  77. Chen RC, Dewi C, Huang SW, Caraka RE. Selecting critical features for data classification based on machine learning methods. J Big Data. 2020;7(1):52.

    Article  CAS  Google Scholar 

  78. Chen H, Ma K, Lu C, Fu Q, Qiu Y, Zhao J, et al. Functional redundancy in soil microbial community based on metagenomics across the globe. Front Microbiol. 2022.

    Article  PubMed  PubMed Central  Google Scholar 

  79. Mahajan P, Uddin S, Hajati F, Moni MA. Ensemble learning for disease prediction: a review. Healthcare. 2023.

    Article  PubMed  PubMed Central  Google Scholar 

  80. Shen Y, Zhu J, Deng Z, Lu W, Wang H. EnsDeepDP: An Ensemble Deep Learning Approach for Disease Prediction Through Metagenomics. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2023 Mar;20(2):986–998. Conference Name: IEEE/ACM Transactions on Computational Biology and Bioinformatics.

  81. Spinoni J, Vogt JV, Naumann G, Barbosa P, Dosio A. Will drought events become more frequent and severe in Europe? Int J Climatol. 2018;38(4):1718–36.

    Article  Google Scholar 

  82. Ault TR. On the essentials of drought in a changing climate. Science. 2020;368(6488):256–60.

    Article  CAS  PubMed  Google Scholar 

  83. No JH, Nishu SD, Hong JK, Lyou ES, Kim MS, Wee GN, et al. Raman-deuterium isotope probing and metagenomics reveal the drought tolerance of the soil microbiome and its promotion of plant growth. mSystems. 2022;7(1):1249.

    Article  Google Scholar 

Download references


Not applicable


RD, CW, and SJS are/were supported by funds of the Federal Ministry of Education and Research (BMBF), Germany [01\(|\)21038]. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Author information

Authors and Affiliations



MH conceived, designed, and performed the analysis with inputs from SP, RD, CW, SJS, and JB. MH drafted the manuscript and it was reviewed and edited by all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sascha Patz.

Ethics declarations

Ethics approval and consent to participate

The study incorporated 16S rRNA metagenomic data from plants and soil that is publicly available and was conducted in accordance with the present ethical guidelines as of the date of publication.

Consent for publication

Not applicable.

Competing interests

The authors MH, RD, CW, SJS, and SP, currently or formerly employed by Computomics GmbH, and JB of the Justus Liebig University Giessen declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Figure S1. Weekly watering scheme of the Sorghum-Drought test dataset. Table S1. Feature table pruning between the datasets. Figure S2. Relative abundances per rank of the Grass-Drought dataset. Figure S3. Significant taxa intersections between DAA tools per rank of the Grass-Drought dataset. Figure S4. ROC curves per rank of the Grass-Drought dataset. Table S2. Logistic regression performance of the Grass-Drought dataset. Figures S5 and S6. Taxon enrichment, significance and importance by DAA tools and SHAP values of the Grass-Drought dataset. Table S3. Random forest classifier performance on the Grass-Drought dataset excluding the Hold-Out dataset. Table S4. Random forest classifier performance on the Hold-Out dataset of the Grass-Drought dataset.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hagen, M., Dass, R., Westhues, C. et al. Interpretable machine learning decodes soil microbiome’s response to drought stress. Environmental Microbiome 19, 35 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: