laitimes

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

author:Biodiversity and phylogeny
Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Large-scale quantitative assessments of insect biodiversity and their constituent factors remain insufficiently explored. Here, we conduct a spatial phylogenetic analysis of North American butterflies to test whether climate stability and temperature gradients shape their diversity and locality. We also performed the first quantitative comparison of spatial phylogenetic patterns between butterflies and flowering plants. It was found that the biodiversity patterns of butterflies differed significantly from flowering plants, especially in warm deserts, and butterflies exhibited different phylogenetic clustering patterns compared to flowering plants, suggesting differences in habitat conservation between the two groups of animals. These results suggest that shared biogeographic history and nutritional associations do not necessarily guarantee similar diversity outcomes. This work has applications in conservation planning.

One way to rapidly expand the knowledge base of insect biodiversity and protect it in the context of accelerated decline in terrestrial biodiversity (van Klink et al., 2020) lies in focusing on branches where existing data are already dense but not yet fully integrated. Such efforts also provide a unique basis for direct, empirical comparisons with other genealogies. Butterflies (Phoenix family) are an ideal group for researchers and naturalists due to their vibrancy and bright colors, and they are not only the most collected and photographed insects (Scoble, 1995), but many species and clades of butterflies have become models for studying different ecological and evolutionary processes, such as Bayeri and Müllerian mimicry (e.g., butterflies of the genus Heliconius (Brower, 1996; Kronforst and Papa, 2015; Lewis et al., 2019)), genetics and migration (e.g., butterflies of the genus Danaus), and adaptation to agricultural systems (e.g., common and widespread cabbage white, Pieris rapae (Shen et al., 2016)), genetics and migration, butterflies are one of the few insect taxa that conservation agencies such as IUCN have made initial assessments of species endangered status (Bonelli et al., 2018).

There are approximately 1900 species of butterflies in North America (Lotts and Naberhaus, 2017), and natural history and genetic data show nearly 1500 species of butterflies. These powerful data sources allow us to move beyond simple taxonomic summaries of diversity, such as species richness, towards a more holistic, process-oriented understanding of how diversity evolves and structures at the continental scale. Despite this potential, a comprehensive, broad systematic diversity analysis of butterflies (or any other insect taxa) and the drivers of this diversity has yet to be undertaken. Even North American summaries of butterfly taxa diversity are limited (Ricketts et al., 1999; Kocher and Williams, 2000; Luis Martinez et al., 2002).

Butterflies are sensitive to climate change (Dennis, 1993), and a fundamental question is how current climate and historical changes in temperature and landscape in North America shape the phylogenetic diversity and endemism of butterflies. North America has extensive ecosystems, dynamic geological histories and significant insect diversity (Danks, 1994; Godfray et al.,2000)。 Butterflies are distributed across 14 broad ecoregions, from temperate forests in the east to tundra and taiga forests in northern Canada, tropical moist forests in southern Mexico, and warm and cold deserts in the southwest (Lotts and Naberhaus, 2017). Due to the formation of the Sierra Nevada mountains due to prolonged aridization and orogeny, the landscape of the entire continent changed dramatically during the Quaternary, especially in the west, and straddled the northern part of the continent through the cyclic pattern of glaciers (Bintanja and van de Wal, 2008).

Butterflies also rely heavily on flowering plants as a source of nectar and larval food for adults (Bronstein et al., 2006). A key question is whether butterflies and angiosperms exhibit broad biogeographic patterns that harmonize given these strong ecological associations and shared historical landscape and climate drivers. Recent efforts to document plant phylogistic diversity in North America (Mishler et al., 2020) provide a data basis for direct, quantitative comparisons of butterflies and angiosperms. The most recent analysis is the most comprehensive attempt to date, covering more than 19,500 plant species found across the continent (more than 44,000 species in total). This study is the first to directly compare spatial patterns and drivers of phylogenetic diversity between any group of insects and flowering plants at the continental scale.

Here, we collect and analyze the spatial phylogenetic diversity of North American butterflies and study its relationship to historical climate and flowering plant phylogenetic diversity patterns. The phylogenetic approach has two key advantages over traditional classification methods. First, phylogenetic indicators reduce reliance on species definitions; Instead, branch length is used to calculate diversity metrics. Second, the spatial phylogenetic approach introduces evolutionary history and allows for hypothesis testing, making it possible to assess, for example, whether the kinship between communities is more distant or closer than incidentally expected.

We applied a set of spatial phylogenetic methods and indicators, including phylogenetic diversity (PD) (Faith, 1992), phylogenetic endemism (PE) (Rosauer et al., 2009), and relative phylogenetic diversity and endemism (RPD and RPE) (Mishler et al., 2014). We also used CANAPE, which distinguishes the type of endemism found in an area (Mishler et al., 2014), i.e. between the endemism that led to recent radiation to new endemism and the relic endemism that led to limited range groups that were once more extensive, i.e. paleoendatic. Although these indicators are now commonly used, we have made a brief summary of the indicators used here in Table 1. We use these phylogenetic diversity and endemic indicators to test hypotheses about a range of underlying drivers and associations, including unique, direct empirical comparisons between butterflies and flowering plants. These same techniques also document diversity and endemic centres that may differ from plant or vertebrate groups and inform conservation priorities. Based on a recent analysis of plant system diversity in North America, we made the following predictions:

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

1. Warmer and more stable regions will have higher phylogenetic diversity (PD) over time (Rohde, 1992; Mittelbach et al.,2007)。 Stable regions (whether warm or cold) should have significantly higher PD values than random regions because they have the most time to accumulate lineages (Cowling and Lombard, 2002), as well as specialization that may build communities to avoid competition (Fine, 2015).

2. In the most stable regions, relative phylogenetic diversity (RPD) will be higher than expected, accumulating more long-lived, older lineages (Fine, 2015). In North America, this includes the eastern and southernmost parts of the continent, as seen in flowering plants. Areas with high topographic heterogeneity and the most unstable climate, such as recent glacier retreats in parts of the north and west, will have significantly lower RPDs than expected.

3. The phylogenetic endemism of butterflies in North America will be consistent with hot spots of high endemism in angiosperms. New endemic hotspots are more likely to occur in younger areas with higher undulations, while paleo-endemic areas are more likely to occur in areas with more stable climates and landscapes.

4. Patterns of systematic diversity of flowering plants and butterflies at the continental scale will be consistent due to the co-evolutionary dynamics between the two taxa and the similarity of underlying landscape and climate drivers. On the other hand, the relative breadth of butterfly host preference and the number of host plants may dilute and mask consistency compared to overall plant diversity.

outcome

Phylogeny of North American butterflies

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 1 A time-calibrated tree of 1437 North American butterflies, with the 39 deepest nodes (before the K-Pg boundary) showing bootstrap support

A total of 1437 (74.6%) known butterfly species already had sequence data for COI or performed de novo sequencing (Figure 1). De novo sequencing added 140 species, of which 96 (68.6%) were found only in Mexico.

Observed patterns of diversity and locality

Both the observed map of species richness (Figures 2A and S2A) and the map of phylogenetic diversity (Figures 2B and S2B) indicate that diversity is highest in tropical wet and dry forests, mainly in Mexico, while diversity is lowest in the Canadian Arctic. In contrast, observed RPD was low in much of the western temperate mountains, the Mediterranean region of California, and boreal ecosystems such as boreal forests and taiga. Phylogenic endemic (PE; Figures 2D and S2D) show the same overall latitude gradient as PD and RPD, but include regions with higher phylogenetic features along Mexico's temperate Sierra Nevada and Pacific Coast mountains.

Spatial randomization test

We found highly regionalized patterns of overdispersion and PD-based clustering (Figure 3B). All northern, taiga, and tundra regions had lower PDs than expected, suggesting systematic clustering. Much of the temperate region of the West, including cold deserts, West Coast forests, and different ecoregions of the Mediterranean part of California, also show clustering. In contrast, tropical moist and dry forests with the highest systemic diversity in North America had higher than expected PD or phylogenetic hyperdispersion compared to the ineffective model. We also note that parts of the south-central semi-arid steppe also exhibit higher-than-expected system diversity. The PD values of southern, warm desert and eastern temperate forests were not significantly higher or low.

RPD randomization showed that branch lengths in communities in southern North America were higher than expected under the model (Figure 4B). This includes not only the tropics, but also semi-arid highlands and southern deserts that turn into semi-arid plains and savannahs. On the other hand, branches shorter than expected branch lengths have been found in the Sierra Nevada, Rocky Mountains, and much of the Intermountain West. We found no significant RPDs in the eastern temperate forests, northern Great Plains, and northernmost regions of North America.

Paleoendemic sites are rarer in the predominantly warm deserts and southeastern coastal plains, as well as in the southern subtropical part of Florida, and are only found in some parts of tropical Mexico.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 2 Patterns of diversity and locality

OBSERVATIONS OF THE NORTH AMERICAN BUTTERFLY: (A) TAXONOMIC RICHNESS, (B) PHYLOGENETIC DIVERSITY (PD), (C) RELATIVE PHYLOGENETIC DIVERSITY (RPD), AND (D) PHYLOGENETIC ENDEMISM (PE). A map without a logarithmic zoom palette can be viewed in Figure S2.

Drivers of system diversity

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Average annual temperature is the most important environmental variable for predicting PD, followed by mean annual precipitation and precipitation season (Table 2). Warm and humid areas usually have the highest PD. In addition, high altitudes and areas with more seasonal precipitation have higher PDs overall, but these effects are weaker. Temperature is the only covariate where the coefficient estimate is greater than the standard error (Table 2). Warmer regions are more likely to have significantly higher PDs. A lower RPD in one region indicates a relatively short branch, possibly indicating more exposure to radiation recently. Analysis of climate and topographic drivers showed higher RPD in areas with higher temperature stability and precipitation. In addition, RPD is higher at lower altitudes (Table 2).

Similarities and differences in butterflies and plant system diversity

In a simple linear model with butterfly PD as the response variable of plant PD, the PD of butterflies and North American plants showed a similar pattern (r2=0.34), and the inconsistent regions showed a moderate residual spatial structure. The PD of butterflies is higher than that of angiosperms in the tropics and lower than that of angiosperms on the west coast of North America (Figure 6). Surprisingly, the RPD patterns of butterflies and plants were not similar (r2RPD exhibited a strong linear model residual spatial structure, and the RPD of butterflies was much higher than the RPD prediction of angiosperms (Figure 6). Both butterflies and plants showed a pattern with the highest PE values in southern Mexico, but overall, the similarity between North America was relatively weak (r2=0.10). The spatial residuals of the PE linear model reflect PD in the southern part of the continent, but there is no spatial structural error in the temperate regions of the continent (Figure 6).

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 3 The difference in PD was statistically significant

Statistical significance of phylogenetic diversity (PD) in (A) angiosperms and (B) North American butterflies. Areas with significantly high values have lower taxa kinship than expected by chance (blue), while regions with significantly low values have taxa that are closer to each other than expected by chance (red).

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 4 The difference in RPD was statistically significant

Statistical significance of (A) relative phylogenetic diversity (RPD) of angiosperms and (B) butterflies. The branches in the blue area are significantly longer than expected; The branches in the red area are significantly shorter than expected.

Compared to angiosperms, butterflies had significantly different PDs, with higher than expected values in the south and lower than expected values in the north (Figure 3). Angiosperms showed strong patterns, with significantly lower than expected RPD values in western North America and significantly higher RPD values in southern Mexico and eastern North America (Figure 4). Butterflies also have significantly higher RPD zones in tropical moist and dry forests of southern Mexico, but unlike flowering plants, they also show high RPD in Baja California and the southwestern United States. In addition, unlike angiosperms, butterflies did not show significantly higher RPD in eastern temperate forests. In most of western North America, RPD was significantly lower in both groups (Figure 4).

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 5 CANAPE results

CANASE results showed that (A) angiosperms and (B) butterfly phylogenetic endemic centers were statistically significant. All colored cells had significantly higher PE. Rare short shoots concentrated in red cells; Blue cells have concentrations of rare long branches (ancient ende), and purple cells have a mixture of new and ancient endemics.

Flowering plants and butterflies exhibit generally inconsistent endemic patterns. While CANAPE's findings on butterflies also show the phylogenetic endemism of Florida's mix, the findings on different groups are otherwise very different. Butterflies show strong mixed endemic patterns in parts of northern Mexico, warm deserts, and cooler deserts in the southwest, predominantly neoendemic in the western coastal areas and limited paleoendemism in southern Mexico (Figure 5).

discuss

We present the first continental-scale analysis of butterfly phylogenity. The analysis was relatively complete, with coarse-scale distribution data for all species and ~75% sampling of North American species. For the first time, this level of integrity provides a well-resolved, continental-scale view of phylogenetic diversity across insect suborders. However, even if the range estimation of terminal taxa is expanded, the "marginal effect" problem cannot be completely solved, because the range size of the relevant taxa outside the study area is still ignored, which means that the range estimation of deeper branches may be poor, affecting PE.

Although we predict that temperature and precipitation stability will be key predictors of PD and RPD significance, they are not top predictors in any model. Conversely, current annual temperature is the main driver of PD significance, and after including spatially relevant random effects, our predictors are not particularly useful in determining drivers of RPD significance. Below, we summarize these results more thoroughly based on the major regions of North America, with an emphasis on synthesis across geographic distances and environmental gradients.

North America: We define northern North America as an area that was mostly covered by ice during the last ice age 21,000 years ago. The northern part of the continent has a lower PD, while the RPD is not. The former indicates the importance of environmental filtration due to cold and seasonal conditions, while the latter expands and contracts the orbital forcing range in glacial cycles with limited fragmentation.

Western United States: Western regions south of the ice sheet and north of warm deserts in the past showed very strong and significant low PD and RPD patterns. The former indicates potential environmental filtration, given that the elevation of steep slopes and the climate gradient itself are in flux during the Pleistocene-Interglacial cycle, while the latter may indicate that butterflies have experienced recent radiation in these regions.

Southern North America: The southern region of North America showed particularly striking results, especially in warm desert areas. Both PD and RPD are significantly higher in tropical regions of North America, which is consistent with the view of the Tropics for the Museum of Butterfly Diversity. In the warm desert, we found significantly high RPD, but no indication of clustered or overly dispersed PD. This new result shows the presence of ancient butterfly communities in the desert in climates formed in the mid-Pliocene. It has long been known that the flowering plants found in this area come from the associated lineages of thorn bushes and arid highlands, which are much older phylogenetically.

Eastern Rocky Mountains: PD and RPD in the eastern Rocky Mountains, which include the Great Plains and eastern temperate forests, are not significant, in contrast to the spatial phylogenetic findings of flowering plants, which we will discuss in detail below. We were particularly surprised that there was no accumulation of younger than expected lineages (i.e., significantly lower RPD) in the mesoligo-dominated Great Plains region during the Miocene and Pliocene cooling, as recorded in other taxa. In addition, the incidence of PD or RPD was not significantly higher in tropical Florida. However, tropical Florida and nearby coastal plains do show significant high levels of mixed PE, which is consistent with known hotspots of plant biodiversity.

Given that strong ecological associations and shared abiotic drivers play a role in long evolutionary time frames, we predict that butterflies and plants have similar patterns of systematic diversity, while also acknowledging the alternative hypothesis. This projection is widely confirmed in the western and northern parts of the continent, both of which have been affected by recent disturbances, including glaciation and aridification. However, our analysis also revealed significant differences. For example, butterflies and flowering plants did not exhibit similar RPD patterns, suggesting that the timing and rate of diversification between butterflies and angiosperms may not be associated with the size and extent of this analysis. The reasons for these differences may be methodologically related. However, these results also suggest that shared historical forces and strong ecological associations can still lead to different historical and current biogeographic outcomes. We will discuss more about the methodology and biological principles of similarities and differences below.

Although we used consistent PD metrics in both studies, allowing direct comparison of outputs, sampling completeness for flowering plant and butterfly analyses varied widely. Sampling of flowering plants includes an order of magnitude more named species than butterflies, but is incomplete in terms of phylogeny (including about 44% of taxa) and spatial distribution information.

These differences in sampling completeness and spatial bias make strong evaluation of patterns more challenging. Significant low PD (phylogenetic clustering) is generally considered habitat filtration due to phylogenetic conserved habitat preferences that tend to occur together in colonies. It is likely that habitat preference has a higher level of protection in seed plants than in butterflies, a possibility that requires future research.

While it has long been known that western North America has been greatly reshaped by regional tectonic movements, orogeny, and climate change, the full extent to which these effects affect flora and fauna beyond vertebrates is just beginning to be understood. Plant spatial phylogeny studies confirm this in a surprising way, with eastern temperate forests showing much older plant lineages than those in the Great Plains and western North America, both strongly influenced by cooling and aridification, showing more near-term diversity. We hope to find consistent results in RPD detection in butterflies. However, the correlation between the two groups was not strong. According to the spatial residual map (Figure 6), the butterfly community in the west is relatively young. In addition, although some parts of the west showed lower than expected RPD for plants and butterflies, suggesting closer radiation there than in other areas, our butterfly RPD in the east was not as high as expected in plants. In Mexico, butterflies have significantly higher RPDs than plants.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Fig. 6 Univariate linear regression spatial residuals observed for PD(A), RPD(B) and PE(C), where angiosperm indicators were used to predict butterfly indicators. High residual values (blue) indicate areas where butterfly values are higher than predicted.

These results suggest that more unstable regions, such as northern and western North America, may show moderate inconsistency when compared between groups with different trophic levels, but in a consistent manner. In the west, more extensive disturbances caused by the loss of continuous forests and continued cooling and drying may have contributed to a stronger imbalance, with butterflies forming on the heels of an explosion of new plant lineages in the region. Spatial residual plots of relative phylogenetic diversity support this (Figure 6), but further examination in other herbivores is necessary to determine whether this ranking effect is more prevalent. We hypothesize that regions with more active geological histories will show evidence of community age lag between hosts and consumers/pollinators.

The phylogenetic endemic patterns of plants and butterflies found at CANAPE are surprisingly inconsistent, with plant-specific patterns found in southern and central Mexico being much stronger than those of butterflies. Mishler et al. truncated the extent of the southern edge of the area of interest, which could lead to more artificial range restrictions and higher locality. Nevertheless, the inconsistencies in the endemism of plants and butterflies are noteworthy and may point to fundamental differences between butterflies and plants in ecological, evolutionary and biogeographic processes that warrant further study.

The importance of protecting butterflies

Butterflies are under threat, most notably monarch butterflies and their declining numbers. The results here provide a reference for regions where protection needs are better prioritized. We recorded areas with high PE, PD and RPD, which are likely to be priority areas for protection. While some hotspots have long been recognized, understanding where diversity is highest, rarest and most threatened remains a priority to be done; For example, the coastal plains of North America have only recently been recognized as a hot spot and further work is still needed. Our findings provide strong evidence for habitat conservation in warm deserts, especially biodiversity hotspots that have not yet been documented.

Spatial phylogeny of vascular plants in Florida: the effects of calibration and uncertainty on diversity estimates

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

The availability of recent biodiversity data resources has allowed us to estimate phylogenetic-based biodiversity indicators on a broad scale. In this paper, we explore how the differences between phylogenetic source trees and the level of phylogenetic uncertainty affect these indicators and test existing hypotheses about the geographic biodiversity patterns of different vascular flora in Florida, USA. We combined an ecological niche model of 1,490 Florida species with a "purpose-built" phylogenetic tree (phylogenetic map and temporal map) and a tree from community resources (phylogenetic tree and open life tree).

The recent explosion of biodiversity data (spatial and genetic) and environmental data (on climate, topography and vegetation), as well as new analytical methods and tools, have facilitated species distribution modelling and combining these results into a broad diversity assessment that is more diverse than other regions that connect the same number of species through shallower nodes. The phylogenetic approach extends diversity measures from simple species counts to measures that can also inform evolutionary patterns and processes.

PD is calculated as the sum of the lengths of branches in the phylogenetic tree that connect terminal taxa from a specific location, usually to the root of the tree. PD can be interpreted as the number of "trait diversity" contained in regions of interest in the phylogenetic map, the number of morphomorphs that appear in a region, or as the number of "evolutionary history" when using time-calibrated chronology (Davies & Buckley, 2012; Rosso Wind, 2010). For regions with higher PD, conservation can be prioritized because of the inclusion of higher genetic diversity or greater evolutionary history), although there are clearly other potential criteria such as threat status in conservation assessments (Jetz and Freckleton, 2015).

Although it is relatively easy to obtain such trees, their quality and inherent uncertainty are rarely studied, and the effects of these factors on PD assessment are not well studied (Molina-Venegas and Roquet, 2013; Rangel et al.,2015; Thornhill et al.,2017)。 Both the topology and branch length of the tree are determined by the sampling of the taxa and the genetic sequence used, which must be considered when calculating and interpreting PD measurements. For example, limited taxonomic sampling from a tree results in a single branch that is longer than what really exists, while limited sampling of genetic data can result in unrepresentative branch lengths. Similarly, using a phylogegram and a time plot will produce branch length differences, so PD measurements will also differ.

Taking Florida vascular plants as an example, this paper comprehensively analyzes how the selection of input phylogenetic trees and the inclusion of phylogenetic uncertainty affect the evaluation of PD indexes. To test the importance of input trees, we developed phylogenetic trees with the specific aim of estimating biodiversity through integration with distribution models.

We chose Florida as the focus of our study because it is home to about 4300 species of native or domesticated vascular plants and has extensive terrestrial and aquatic habitats (Wunderlin et al., 2017). In addition, Florida is part of the North American coastal plain biodiversity hotspot (Noss et al., 2015) and the third highest federal density of sensitive, threatened, and endangered species in the United States (Ihlo et al., 2014), after California and Hawaii (Dobson et al., 1997). In addition, one-third of Florida's flora is now made up of alien species (natural or invasive), with increasing habitat loss due to human development (Gordon, 1998). However, despite the serious ecological and conservation problems in the area, little is known about the overall geography of Florida's plant diversity.

Our empirical goal is to test hypotheses about Florida's biodiversity patterns derived from previous studies of forest types, vertebrates, and butterflies. In particular, work was made to document the overall decline in Florida's diversity from north to south, although this model was only qualitatively assessed based on maps of the richness of various vertebrates and butterflies.

The methodological objectives of this study are to explore the influence of phylogenetic tree selection on spatial phylogenetic measures (PD and RPD) and to provide a more effective method to explain the sources of uncertainty in phylogenetic trees. We generate PD and RPD using multiple input system occurrence trees and compare results using multiple methods to understand how to interpret the differences and uncertainties in these assessments. We also study how spatial phylogenetic indicators change between trees pruned from existing supertrees and trees inferred from curatorial analysis, in which a rigorous effort is made to close gaps in taxa sampling using a strategic approach to genetic sampling and branch length assessment. Finally, we used the Bayesian framework to generate tree distributions representing phylogenetic estimation uncertainties to assess the impact on PD.

outcome

The validation index of all models was high, and the area under the training curve (AUC) score was > 0.8 points, and the test AUC score in almost all cases was within 0.15 points of the training score. A small percentage of models perform significantly poorly, with a difference of >0.5 between training and testing. We removed species that differed from the outlier AUC score by > 3 standard deviations from the mean of the final analysis. In the end, 1490 models (ie. One per species). Figure 1 shows the species richness of the stacked model based on the stacked model at a resolution of 4 km in Florida, from the lowest 57 species to the highest 856 species. Figure 2 summarizes the PD metrics observed statewide, including pedigree and time plots.

The affinity between vascular plants based on these two plasmid genes is basically consistent with the results of phylogenetic analysis based on more genes and taxa. Known rapidly radiating lineages ( such as Asteraceae ) also exhibit shorter branches as expected. Similarly, due to the short length of the branches, there are also clades with very low phylogenetic resolution. The overall phylogenetic framework of Florida vascular plants is highly similar to the accepted framework based on broader geographic analysis, and the relationships between vascular subclades also reflect the results of other studies.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 1 shows. Phylologe, spatiotemporal map and species richness of Florida vascular plants (A) Phylogenesis of 1490 vascular plants in Florida, shown in figures (left) and (right). The black dots on the chronology indicate the locations of the 17 calibration points. (B) Map showing species richness with PD of one grid cell highlighted in red on the system map and time plot.

The few instances where the topology differs from the published analysis are minor deviations from expectations and are the result of a finite data set (i.e., a data set). Given that Florida's vascular plants are only a small subset of global diversity, these biases may also reflect sampling issues.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 2 Phylogenetic diversity of vascular plants in Florida: observed (upper panel) and significant (lower panel) phylogenetic diversity as measured from the phylogenetic map (left) and chronological plot (right) of vascular plants. On the top panel, Environmental Protection Agency Level III ecoregions are marked with dark lines, and Level IV ecoregions are marked with light lines and identify areas of interest, such as Welsh Lake Ridge and Miami Ridge.

PD analysis, as supporting better long branches contributes the vast majority of PD (Gonza ́lez-Orozco et al., 2016). Calibrating the tree with fossil constraints (Table S4) yields a time plot compared to the phylogram in the downstream analysis (Figure 1); This chronology smooths the branch length relative to the genealogy chart (see Figure 1 for comparison).

Mapping the PD of Florida relative to the Environmental Protection Agency's Tier III and IV ecoregions (Figure 2) and latitude (Figure 3) reveals ecological and geographic patterns. The highest PD is in the northern part of the Florida peninsula, south to Orlando and near St. Petersburg. In the pedigree and time plots, PD in Central Florida is higher than in the north and south of the state (Figure 2). Vertically, South Florida presents a mixed pattern; The Everglade and Big Cypress have relatively low PD, while the Miami Ridge at the same latitude has a relatively high PD (see Figure 3 at 26.5 N towards the long tail towards positive PD values). Average PD values by ecoregion also show higher PD in the southern coastal plain running through the Florida peninsula than in other areas (Table 1).

There are differences in regions where PD is significantly high or significantly low as measured on the pedigree and time plots (Figure 2). The values derived from the time plot show strong uniform patterns in the northern and central Florida regions, particularly in the southeastern coastal plain ecoregion. In contrast, PD saliency based on phylogram more directly represents feature diversity, showing clustering of germlines in several areas of the state, particularly along the northwest coast of Florida, which is rarely uniform anywhere.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 3 The latitudinal phylogenetic diversity study area is divided into 13 latitudinal profiles by 0.5?, which are represented by the lines on the map on the right. The bean plot on the left represents the phylogenetic diversity values of pixels within each profile in the phylogenetic map (top) and time plot (bottom), respectively

RPD patterns also differ significantly between pedigree and time maps. Geographic areas where there are major differences between the values derived from the temporal and phylogenic plots include (Figure 4) :(1) Northern Florida, where the temporal plots produce high observed RPDs, and more broadly significant high RPD concentrations (i.e., RPDs). The branches are longer than expected), longer than the results of the phylogram yield; and (2) Central Florida, where phylogenetic maps generally do not show significant low RPD (i.e., RPD). shorter branches than expected), and so is the time series chart; (3) Florida is very southern, including the Miami Ridge area

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Table 1 PD and SD calculations were performed on cells contained in the three ecological regions for phylogenetic and temporal maps

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 4 Relative phylogenetic diversity Observed (upper panel) and significant (lower panel) relative phylogenetic diversity measured from (A and C) phylogenetic maps and (B and D) time series maps of vascular plants.

Since the branch lengths are determined by the same permutations, similarities are expected between the specially constructed tree and the tree of the OTL. The PD values generated by each source system map were significantly concentrated along the Florida Panhandle and Peninsula Florida, while the PD values generated by each source time map showed uniformity at the northern edge of the Panhandle. We found that OTL trees were most similar to specially constructed trees, with less than 5% of cells showing significant differences for kinship-derived values and less than 20% of cells showing significant differences for timeline-based values. Measurements utilizing Phylomatic trees, despite fewer taxa deleted, had more differences from purpose-built trees, with more than 25% of the cells showing different significant outcomes (Figure 5; Table 2). Differences in taxa sampling between different methods moderately affect this spatial phylogenetic measure (e.g., in the case of fewer taxonomic units, only the area east of LWR is prominent as a significant clustering area in the phylogenetic map; Figure 5b).

Overall, the standard deviation of the PD score calculated from the 100 timemaps was greater than the standard deviation calculated from the 100 genealogy maps, and the similarity between the two maps was most pronounced in the Panhandle and Florida (Figures 6A-6C).

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 5 The diversity hypothesis test compares temporal and phylogram maps of vascular plants constructed using our purpose-built, Phylomatic, or open trees

The top panel is a phylogram and temporal map of the purpose-built tree, trimmed into (A) phylogenies and (B) open tree taxonomic sets. The middle panel is (A) germline tree and (B) open tree tree. The panel below is the difference between the two maps. Grayscale pixels are pixels that change at the level of significance between the phylogenetic tree and the open tree tree.

discuss

We found peaks in plant diversity in the northern part of the Florida peninsula (rather than in the panhandle). While PD may be the highest in the northern part of the Florida peninsula, many panhandles have significantly more PD than expected, especially when considering a time graph rather than a system diagram. The southeastern forest consists of communities containing deeply evolved branches, particularly in time-calibrated phylogeny. In South Florida, which was completely submerged during the last interglacial period (Germain-Aubrey et al., 2014), we found an unusual pattern of germline diversity in which only the phylogram shows a strong signal of RPD, or RPD. Under the assumption of zero, the branch is significantly longer than expected. We believe that phylograms with relatively long branches, which are generally toward the tip, may in some cases show stronger patterns than temporal plots, which tend to redistribute branch lengths from the ends to deeper branches (as shown in Figure 1). In southern Florida, taxa of Caribbean or Central/South American origin may be dominated by longer terminal branches. Further examination of community composition and method selection for possible artifacts is necessary.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Table 2.Number of cells showing different results between tree resources

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Above: Standard deviation of phylogenetic diversity observed in all (A) 100 phylogenetic plots and (B) 100 temporal plots in our Bayesian analysis. (C) The difference between the two indicated in blue on the right. Bottom: The light blue area is the area where 91 (100) or more trees have the same level of significance. The yellow areas were mostly consistent, with 71-90 out of 100 trees finding the same level of significance. The red areas are relatively inconsistent pixels, with only 50-70 trees having the same level of significance for (D) 100 system maps and (E) 100 time maps.

In the Central Florida Peninsula, we found stronger phylogenetic clustering patterns and shorter branches than would be expected using a phylogenetic map or timeline plot. While Central Florida is a region rich in plant diversity, it includes areas that were inundated during Pleistocene interglacial sea level rise, as well as more persistent dry shrubland, but coexisting taxa may have been filtered due to evolutionary conservative preferences of certain lineages toward harsh environments in these areas, such as over-drained soils and extreme heat. Alternatively, part of this pattern may be due to in situ differentiation, while some taxa may have arrived more recently.

Our results are also consistent with those found in other distant relatives' lineages. Using the Florida Gap Analysis, high species richness of vertebrates and butterflies was recorded, particularly in the Panhandle and extending into the core of Central Florida.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Table 3 Number of units in each type of uncertainty Classes represent the number of trees with a consistent level of significance. For example, a class of 50-70 indicates that trees with 50-70 pixels have similar significance and trees of 50-30 do not. For classes 71-90, more trees were consistently significant, while for classes 91-100, most trees found the same level of significance, suggesting that the pixels were consistent when accounting for uncertainty in phylogenetic estimates.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 7 The uncertainty of phylogenetic diversity at different latitudes was divided into 13 latitude profiles at 0.5?, represented by the lines on the map on the right. The bean plot represents the phylogenetic diversity values for all 100 trees per pixel within each profile in the phylogenetic plot (left) and the time series plot (right), respectively.

Finally, given known endemic areas and unusual flora, two areas of particular concern are the Miami Ridge Pine Rock and LWR (see Figure 2, upper left corner, and Figure S3 for locations). As expected, in the Miami Ridge area, we found increased PD and significant PD clustering for some pixels based on the time map, while significantly higher RPD for other pixels in the system graph. LWR, in particular, is known to have a high degree of endemic species diversity (Myers and Ewel, 1990; Germain-Aubrey et al., 2014), we hypothesize that a high degree of new endemism may be revealed when examined using phylogenetic endemic indicators in the future. Strong clustering is shown in the area immediately to the eastern part of LWR, a habitat mosaic including pine plains, dry grasslands, and swamps (Myers and Ewel, 1990), and significant aggregation in this area may indicate a strong filtering effect in these habitats. Conservation priority areas have also been identified in areas such as the Pine Rock Ridge in Miami, which are directly threatened by rapid and sustained human development, providing further strong justification for conservation action.

The relationship between the major clades of ferns (solitaires), gymnosperms, and angiosperms is closely aligned with broader phylogenetic analyses focusing on these specific subclades (e.g., Angiosperm Phylogenetic Group IV, 2016; Schuettpelz and Pryer, 2007; Smith et al.,2011; Soltis et al.,2011; Stevens, 2001). On the purpose-built Florida Tree, some of the more difficult areas were properly addressed in the OTL-generated topology. This may be due to the constantly updated nature of OTL, where the tree topology integrates previously estimated trees into the framework to produce a "composite" tree. The results suggest that OTL may be an important resource for future spatial phylogenetic analysis. If there is enough terminal taxa sampling for the region represented in the OTL, researchers will be able to save a lot of time building their own region-specific trees.

While phylogenetic trees and OTL can give relatively accurate topologies, calculating the branch lengths of these trees remains problematic, especially when using phylograms. To solve this problem, we used DNA sequence alignment to estimate branch lengths on OTL topologies, which may explain why we found smaller differences between results based on our specially constructed tree and OTL trees compared to evolutionary tree trees (Figure 5). However, this approach requires assembling comparisons for OTL phylogenetic trees, which may not serve the purpose of using these resources. Work is currently underway to add branch length estimation to OTL methods, which will further enhance the powerful utility of OTL as a source of spatial phylogenetic analysis.

Snap-in sampling can be another problem with using trees in the repository. Evolutionary trees contain almost all of our terminal taxa (99%), more than OTL tree taxa (80%). If we try to analyze all vascular plant species in Florida, it's unclear how many taxa are available. We found that when 250 taxa were removed, the number of cells in significant clusters in Central Florida decreased significantly (Figure 5). This result suggests that more limited categorical sampling may reduce some efficacy. However, our current analysis may reasonably capture the overall trend in PD in Florida, providing a much-needed initial snapshot of diversity.

Figure 6 shows that about 20% of the pixels exhibited a difference of medium to height significance across 100 trees, flipping between insignificant and significantly high or significantly low (but never from significantly high to significantly low). This pattern is found in both system and time plots, although the standard deviation per pixel is much higher for time plots.

Two key messages can be derived from our analysis. First, while uncertainties may affect judgments of PD and RPD significance in certain grid cells, these cells do not appear to have a geographic structure, so the small amount of uncertainty observed in this paper does not broadly affect the conclusions at the landscape scale.

Pedigree and time plots are two indicators that provide information in different ways. By definition, regions with high PD measured on a system map have high genetic diversity, which may be a better measurement if the goal is to preserve genetic diversity, while regions measuring high PD on a time map contain an unusually large amount of evolutionary time, which may be a better measurement if the goal is to preserve evolutionary diversity.

Limitations of the study

While the study included 1490 taxa of Florida plants, more than 4000 vascular plant species are known to exist in the state, and full inclusion of all species may affect the results of this paper. Further efforts to collect more complete distribution data, especially for range-restricted species, are ongoing, and these records are expected to aid in further species distribution modeling. We note that although phylogeny recovered using a small set of markers is consistent with known relationships, further work will be taken next to develop more reliable phylogenetic hypotheses. Finally, further work linking these patterns to areas with high population growth and sea-level rise will inform conservation efforts.

Global plant diversity models and predictions based on advanced machine learning techniques

Although plant diversity plays a vital role in ecosystem function, biogeochemical cycles and human well-being, knowledge of its global distribution remains incomplete, hindering basic research and biodiversity conservation. Here, we use machine learning (random forests, extreme gradient augmentation, and neural networks) as well as traditional statistical methods (generalized linear models and generalized additive models) to test environment-relevant hypotheses for large-scale vascular plant diversity gradients and model and predict species richness and phylogenetic richness on a global scale. To do this, we used 830 regional plant inventories, including predictors of past and present environmental conditions for about 300,000 species. Machine learning has shown excellent performance that can explain up to 80.9% species richness and 83.3% phylogenetic richness, illustrating the great potential of this technique to explain the complex relationship between the environment and plant diversity. Current climate and environmental heterogeneity are the main drivers, while past environmental conditions have had little but detectable impact on plant diversity. Finally, we combine predictions from multiple modeling techniques (ensemble prediction) to reveal global plant diversity patterns and centers at multiple resolutions over 7774 square kilometers. Our forecast map provides accurate estimates of macroecologically relevant global plant diversity.

Vascular plants include more than 340 000 species (Govaerts et al., 2021) and are the basis for terrestrial ecosystems to maintain ecosystem functions (Tilman et al., 2014) and provide ecosystem services (Isbell et al., 2011; Cardinaleetal.,2012)。 In order to conserve and manage this important component of global biodiversity, it is essential to understand its spatial distribution and the location of biodiversity centres. From the 19th century, by collating regional plant species populations and species richness contours drawn by experts (Wulff, 1935; BarthlottEtal.,2005; See Mutke & Barthlott, 2005). , mapping plant distribution and diversity has a long and rich tradition. Since then, by modeling diversity patterns in responses to environmental and spatial variables (Keil & Chase, 2019; Sabatini et al., 2022), these maps have been refined and scaled to different resolutions (e.g., 12100 km2 in Kreft & Jetz, 2007), allowing continuous predictions on a global scale. The accuracy of this prediction map depends on the quality and representativeness of available plant diversity data, environmental predictors, and applied models. Recent developments in data availability and modeling techniques have enabled plant diversity models to have resolution and accuracy unprecedented to date.

In recent years, knowledge of global plant distribution and vegetation plots (Sabatini et al., 2021), as well as regional inventories and flora, has increased based on international efforts to compile specigenetic records (Enquistetal., 2016; GBIF, 2020). However, these data differ in accuracy, completeness, and scope (K€onigetal., 2019). Specifically, fine-grained data, such as occurrence records and vegetation maps, tend to be geographically biased and cover only part of the regional flora (Meyeretal., 2016; Qianetal.,2022)。 Despite being coarse-grained and often artificially divided, administrative boundaries, inventories, and flora reflect the most complete and authoritative regional flora composition to date and cover almost entirely globally (Weigelt et al., 2020; Govaerts et al., 2021). Thus, inventories and flora are useful resources for global-scale modeling of plant diversity-environment relationships, as well as for predicting plant diversity at different granularities (Keil & Chase, 2019) (Kreft & Jetz, 2007). Including species identity further facilitates the integration of phylogenetic and trait information at the species level, providing opportunities to study multiple aspects of biodiversity.

Although it is widely accepted that plant diversity reflects the complex interplay of evolutionary, geological, and ecological processes, unraveling the drivers of global plant diversity remains an important theme in modern macroecology (Kreft & Jetz, 2007; Tietjeetal., 2022). Some hypotheses related to geography, past and present climates, and environmental heterogeneity in a region (Currie et al., 2004; Mittelbachetal., 2007) was proposed to explain patterns of plant diversity. For example, areas of large heterogeneity are hypothesized to promote species coexistence by providing more diverse resources and habitats to support more species (Connor & McCoy, 1979) and provide shelter during periods of environmental fluctuations (Steinetal., 2014). In addition, areas with warm, humid and relatively stable climates, such as humid tropical forests, should support more species due to high speciation (Rohde, 1992; Mittelbachetal.,2007; Brown, 2014) and low extinction rates (Gillooly & Allen, 2007; Eiserhardtetal., 2015). Geographic isolation may promote both species extinction (Brown & Kodric-Brown, 1977; Ouborg, 1993) and speciation (Kisel & Barraclough, 2010). Finally, historical processes, such as past plate tectonics and climate change, influence diversity patterns by altering bioisolation and exchange or species range shifting (Dynesius & Jansson, 2000; Svenningetal., 2015; Couvreuretal.2021)。

Diversity-environment relationships are often complex, nonlinear, and scale-dependent (Francis & Currie, 2003; Keil & Chase, 2019). Many environmental predictors interact and exhibit a high degree of collinearity, thus posing significant challenges to conventional statistical models such as generalized linear models (GLMs) and generalized additive models (GAMs). Machine learning methods represent powerful modeling tools that efficiently process relevant data and can reveal interactions between nonlinear relationships and predictors (Oldenetal., 2008; Criscietal., 2012), without prior specification. As a result, machine learning has emerged as a promising alternative to traditional techniques in ecology (Hengletal., 2017; Parketal.,2020; Sabatini et al., 2022).

Here, two key points for improving models and predicting vascular plant diversity, namely species richness and phylogenetic richness, are proposed, using advanced statistical modeling techniques on a global scale. In addition to nonspatial and spatial GLM and gam, we systematically evaluate the predictive performance of machine learning methods, including random forests, extreme gradient boosting (XGBoost), and neural networks. Specifically, we aim to: compare the performance of different modelling techniques in revealing complex diversity-environment relationships and improve global geostatistical plant diversity models; To examine plant diversity gradient hypotheses related to geography, environmental heterogeneity, current climate and past environmental conditions, and quantify their relative importance for plant species and phylogenetic richness; and, two aspects of predicting plant diversity at multiple granularities on a global scale. Our study is based on an inventory of 830 regions worldwide and about 300,000 species in the flora (Figure S1), which are collated in the Global List of Flora and Traits (Weigelt et al.).

Results and discussion

Performance of the plant diversity model

Table 1: Performance of a global model of vascular plant diversity based on cross-validation

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Random 10x and spatial 68x cross-validation were used to evaluate the predictive performance of each model. In addition to the minimum adequate generalized linear model (GLM) and GLM with interaction terms, the nonspatial model fits 15 predictors representing geography, current climate, environmental heterogeneity, and past environmental conditions (Table S1). In addition, spatial models include spatial terms (i.e., synchronous autoregressive (SAR) models, generalized additive models (GAMs) including geographic coordinate splines, and machine learning methods including cubic polynomial trend surfaces). Based on the Akaike Information Criterion (AIC), the complete GLM is simplified to obtain the minimum sufficient GLM. GLM with interaction terms was fitted, predictors of all GLMs and interactions of energy-water, energy heterogeneity and regional environment-related variables were fitted, and simplified based on AIC. Since the response variables (i.e., species and phylogenetic richness) are logarithmically transformed in the model, precision statistics are provided on the logarithmic scale. Based on all out-of-bag samples, the values shown are root mean square error (RMSE); The amount of variation explained by the model is calculated as 1 minus the ratio of the sum of squared errors between observations and predictions to the sum of total squares (R2).

Our findings reveal the great potential of machine learning, especially decision tree methods, in simulating the relationship between plant diversity and the environment and in accurately predicting plant diversity at different scales. Overall, the predictive power of the model is high (Table 1). Machine learning models and GAMs outperform GLMs and spatial models (i.e., models containing spatial terms to account for spatial non-independence of regions; Dormann et al., 2007) outperform non-spatial models (except GLMs for species richness) overall. Extreme gradient enhancement is a collection of sequence-trained decision trees that produces the most accurate predictions for species richness (70.3% for variation interpretation based on spatial cross-validation and 80.9% for variation interpretation based on random cross-validation) and phylogenetic richness (73.7% and 83.3%, respectively), which are consistent across spatial and nonspatial models.

We found strong interactions between spatial terms and environment variables. This indicates regional differences in plant diversity and diversity-environment relationships, and suggests that different combinations of environmental variables are important when predicting diversity across geographic regions (Keil & Chase, 2019). In addition, machine learning models reveal strong interactions between energy and water availability, energy and environmental heterogeneity, and area and environmental variables. By implicitly considering the complex interactions between grain dependence and spatial and environmental variables, our machine learning model outperforms previous models of plant diversity (Kreft & Jetz, 2007; Keil & Chase, 2019), improves our understanding of diversity-environment relationships, and improves predictions of plant diversity across scales.

Drivers of global patterns of vascular plant diversity

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 1: The relative importance of categories of environmental variables to explain the global pattern of vascular plant diversity in five nonspatial models (a) species richness; The relative importance of the different variable categories (scaled to a sum of 1) is calculated as 1 minus the Spearman rank correlation coefficient between the model predictions using the randomly shuffled dataset and the predictions using the original dataset. Table S1 shows each type of environment variable. The importance of a single environment variable is shown in Figure S19. GAM, generalized addition model; GLM, Generalized Linear Model; XGBoost, extreme gradient enhancement.

Current climate variables were the most important drivers of plant diversity, accounting for 34.4%~48.1% of the change in species richness and 39.7%~58.2% of the change in phylogenetic richness. (Figure 1; Table S1) High energy and water availability and low seasonality promote species and phylogenetic richness (Figures S5, S6), supporting other large-scale studies that report the strong impact of current climate on plant diversity. Environmental heterogeneity explained 21.0-40.9% of the change in species richness and 16.3-27.2% of the change in phylogenetic richness, with increased heterogeneity leading to higher plant diversity, as expected. Although species and phylogenetic richness were highly correlated (Pearson's r=0.98), there were some differences in the relationship between diversity and environment. For example, environmental heterogeneity explains phylogenetic richness less than species richness. This may reflect in situ speciation signals facilitated by high environmental heterogeneity, resulting in closely related species clusters with relatively low phylogenetic richness compared to species richness. This view is also supported by the negative effect of the number of soil types on the remaining variation in phylogenetic richness after considering species richness (Table S2).

Geographic variables (area and geographical isolation) explained 9.8~23.1% of species richness and 18.0~24.6% of phylogenetic richness. Larger areas tend to have higher rates of in situ speciation because of more opportunities for geographic isolation within the region and lower extinction rates due to larger populations. These impacts should be most pronounced in separate, isolated areas and less pronounced in areas similar to their surroundings. In addition, larger areas often provide more variety of habitats, providing more environmental niches for species. Geo-isolation, where the proportion of the surrounding land is measured, doesn't explain much variation, probably because our dataset consists mostly of continental regions. While geographic isolation is a major driver of island plant diversity, isolation and peninsular effects appear to play only a small role on the continent, where geographic isolation may be more important than richness in the composition of the region and locality.

Because of low extinction and high speciation rates, we hypothesize that higher plant diversity will accumulate in areas with long-term climate stability, therefore, we assess temperature stability and biome change as past climate change in two paleo periods (LGM and mid-Pliocene warm periods). In contrast to the expected legacy effect of historical variables on modern plant diversity, in the past environmental conditions explained species richness by only 0.8-5.5% in most of our models, but as high as 23.8% in neural networks. Similarly, past environmental conditions were more explanatory (15.0%) for neural network phylogenetic richness than other models (4.0-8.5%). Models that include spatial trend surfaces or discrete biogeographic regions (i.e., flora kingdoms) (after statistically controlling for current and past environments) further improve model fitting (Table 1, S4). This suggests that in addition to climate stability since the LGM or mid-Pliocene warm periods, pre-Pliocene biogeographic history or regional features other than climate change influenced modern plant diversity. These historical regional effects may be due to dispersion barriers, peculiar colonizations, and diverse histories (Qian & Ricklefs, 2004; Ricklefs & He, 2016).

Improved global map of plant diversity

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Figure 2: Global pattern of vascular plant diversity predictions on an equal-area hexagonal grid with a resolution of 7774 square kilometers. Species richness (a) and phylogenetic richness (Faith's PD, d) are based on ensemble predictions from five different models (i.e., three spatial models using machine learning methods, one spatial generalized additive model, and one nonspatial generalized linear model with interactions), weighted by model accuracy; Species richness (b) and phylogenetic richness (e) centers are defined as regions with predicted richness above the predicted 90th quartile (i.e., phylogenetic richness containing at least 1765 plant species and 41866 Ma per 7774 km2). The predicted change between models used for ensemble prediction is calculated by the coefficient of change in the predicted values of species richness (c) and phylogenetic richness (f). The horizontal line depicts the equatorial and tropical boundaries. In (a) and (d), all maps use the Eckert IV projection.

We produced a map of global diversity of vascular plant species and phylogenetic richness, based on individual good models and model sets. Due to its excellent predictive ability and ability to process missing data, we consider XGBoost (including geographic coordinates) to be the single most powerful single model for predicting plant diversity (Figures S20d, S21d). In addition, we propose ensemble prediction, which reduces the uncertainty associated with the selection of a particular modeling technique, thereby improving prediction accuracy (Marmionetal., 2009). Including the area and its interaction with other predictors, we can predict equal area and equidistant hexagonal global grids of different granularities.

Our overall projections (Figure 2a, d) describe global patterns of species and phylogenetic richness with unprecedented detail and accuracy. These maps capture how diversity changes with environmental gradients and identify global centers of plant diversity (Figure 2b, e). Barthlott et al., 2005; Kreft&Jetz,2007)。 While the pattern of phylogenetic richness is very similar to species richness (Pearson's r=0.97), there are also differences around the Mediterranean, Central America, the Caucasus, and the Himalayas (Figure S24). High-resolution environmental data and modelling techniques allow regions with steep elevation gradients to exhibit more subtle variations in predictive effects (Barthlott et al., 2005; Kreft&Jetz,2007)。

At the same time, our overall projections show relatively high values in species-poor areas, such as Greenland and non-glacial areas of the Sahara. Here, as well as in other regions with plant diversity extremes, a single model outperforms the ensemble model (Figures S20, S21), which tends to attenuate extremes. Areas with high species and phylogenetic richness are mainly distributed in mountainous areas (Figure S26). Specifically, tropical mountains, including the tropical Andes, the highlands of East Africa and various Asian mountains such as southern China and the Malay Archipelago, are centers of global plant diversity. As previously found (Testolin et al., 2021), high diversity of tropical mountains is associated with warm, humid climates and heterogeneous environments (Antonelli et al., 2018). Multiple biogeographic and evolutionary processes, including speciation, dispersal and persistence driven by long-term mountain orogeny and climate dynamics, have led to outstanding regional plant diversity. Orogeny processes constantly alter the soil composition, nutrient levels, and local climate in mountainous areas, creating novel and heterogeneous habitats where plant lineages diversify and migrate from neighboring areas. Furthermore, climate fluctuations stimulate diversity by driving dynamic changes in habitat connectivity within mountains (Rahbek et al., 2019). Due to their steep environmental gradient and heterogeneity, mountains provide refuge when the climate is unfavorable (Bennettetal., 1991; Rahbek et al., 2019).

In areas of extreme environments, such as deserts and arctics, the differences between models (measured by coefficients of variation) are greatest (Figure 2c, f). The Arctic region also consistently exhibits the highest prediction uncertainty among models (Figures S27, S28). Uncertainty in extreme environmental areas can stem from two reasons. First, areas with extreme species scarcity may be less representative of published diversity data. Areas with extreme environments are often part of a human-demarcated area rather than individually sampled (e.g., sampling from Chad and Libya rather than the Sahara). These human-delineated areas are more environmentally heterogeneous, weakening the extremes of environmental factors and plant diversity. It is well known that even for areas with relatively homogeneous environments, inventories and flora include not only dominant vegetation information, but also non-regional vegetation information, making it richer than expected under mainstream conditions and observable at more local scales (compared to Sabatinietal., 2022 alpha diversity projections).

conclusion

We provide the most accurate and comprehensive global map to date of projections of regional vascular plant species and phylogenetic richness. They are based on significantly improved global models, using comprehensive plant distribution data, high-resolution past and current environmental information, and advanced machine learning models. Our findings illustrate that machine learning methods are applicable to large distributions and environmental datasets, helping to clarify potentially complex and interacting connections between environmental and plant diversity. Therefore, machine learning methods help to improve basic understanding and quantitative knowledge of biogeography and macroecology. The updated multi-granular vascular plant diversity map provides a solid foundation for large-scale biodiversity monitoring and research on the origin of plant diversity, and supports future global biodiversity assessments and environmental policies.

Earl, C., Belitz, M. W., Laffan, S. W., Barve, V., Barve, N., Soltis, D. E., ... & Guralnick, R. (2021). Spatial phylogenetics of butterflies in relation to environmental drivers and angiosperm diversity across North America. Iscience, 24(4), 102239.

Allen, J. M., Germain-Aubrey, C. C., Barve, N., Neubig, K. M., Majure, L. C., Laffan, S. W., ... & Soltis, P. S. (2019). Spatial phylogenetics of Florida vascular plants: The effects of calibration and uncertainty on diversity estimates. IScience, 11, 57-70.

Cai, L., Kreft, H., Taylor, A., Denelle, P., Schrader, J., Essl, F., ... & Weigelt, P. (2022). Global models and predictions of plant diversity based on advanced machine learning techniques. New Phytologist.

Correlation between spatial phylogeny and environmental factors and plant diversity of North American butterflies

Read on