Niche Overlap Analysis Using nicheROVER and Stable Isotope Data
Part of this note is taken from the course Community Ecology offered by University of Hamburg for M.Sc. Biology program and led by Dr. Paula Cabral Eterovick.
The cover image is credited to Museums Victoria on Unsplash.
Niche Overlap
In Ecological Niche, we talked about the basic ideas of ecological niches and niche overlap. Understanding the niche overlap patterns, both intraspecific and interspecific, can help us better understand many topics, such as:
Evolutionary:
- Sexual Dimorphism by studying differences in resource utilization between males and females of the same species.
- Speciation Process by studying niche partitioning between sister taxa or subspecies.
Conservation:
- Impact of Invasive Species by examining niche overlap between native and invasive species.
- Ecological Guild by understanding niche overlap within a group sharing the same or similar functional traits (e.g. pollinator communities).
However, comparing niches among multiple species is challenging, as I mentioned earlier, since ecological data are almost always multidimensional.
nicheROVER
The package nicheROVER
provides a new probabilistic method for quantifying n-dimensional ecological niche and niche overlap (Swanson et al. 2015). The script demonstrated here is written by Martin Lysy, Heidi Swanson (see here), and is also used in their publication.
fish
is a fish stable isotope dataset included in the package, containing values for three stable isotopes (N-15, C-13, S-34) measured in the muscle tissue of four Arctic fish species (ARCS, BDWF, LKWF, LSCS). More details can be found from the documentation (?fish
) or in the publication.
What is SIA?
Stable isotope analysis (SIA) is a powerful tool in ecological niche analysis. Many physicochemical and biochemical processes are sensitive to differences in the dissociation energies of molecules, which often depend on the mass of the elements from which these molecules are made. Thus, the isotopic composition of many materials (expressed as
Gradient | Isotope system | High |
Low |
---|---|---|---|
Trophic level | High levels | Low levels | |
C3-C4 Vegetation | C4 plants | C3 plants | |
Marine-terrestrial | High levels | Low levels |
Bayesian Workflow
Bayesian theorem
Van De Schoot et al. (2021) described the foundational idea of the Bayesian statistics as follows:
Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution.
This means we make an initial estimation of our parameters (prior distribution), then observe data and update the model (posterior distribution). Bayes’ theorem, which this approach is based on, is mathematically stated as following:
where,
and are events and , : the probability of event occurring given that is true. It is also called the posterior probability (briefly posterior) of given . is also a conditional probability: the probability of event occurring given that is true. It can also be interpreted as the likelihood of given a fixed because . ( : likelihood function) and are the probabilities of observing and respectively without any given conditions; they are known as the prior probability and marginal probability.
Why Bayesian Statistics?
Bayesian statistics is a different framework and approach on statistical analysis. Both classical and Bayesian inference, while trying to model data, have own advantages and disadvantages. Bayesian inference is generally employed whenever uncertainty comes into play:
There is also a practical advantage of the Bayesian approach, which is that all its inferences are probabilistic and thus can be represented by random simulations. For this reason, whenever we want to summarize uncertainty in estimation beyond simple confidence intervals, and whenever we want to use regression models for predictions, we go Bayesian.
― Gelman, Hill, and Vehtari (2024), P. 16
Swanson et al. (2015) pointed out a viewpoint from the traditional niche overlap analysis that might introduce problems: the geometric approach doesn’t account for uncertainty and also lacks direction. This is well illustrated by Fig. 1 in the publication:
By inviting the variance of the niche probability distribution, it now also differentiates the overlap in directions, which means the probability of a species A falling into species B’s niche is not equal to that of species B falling into species A’s niche. Since this perspective starts to consider probability in the models, a Bayesian workflow will provide more reliable and robust analysis.
Random Vector and Multivariate Normal Distribution
Given an individual statistical unit, in our case, the niche described by three isotope ratios, a random vector is a group of variables that describes this mathematical system. Which is again, in our case, the three values of the isotope ratio. The distribution of each component of the random vector together forms a multivariate distribution.
When dealing with univariate (one-dimensional) analysis, we often assume that the variable is normally distributed. We do likewise even with multivariate analysis. The multivariate normal distribution is a generalization of the univariate normal distribution into higher dimensions, but it’s not as simple as all components of the random vector being normal. It also requires every linear combination of arbitrary two components in the system to be normally distributed (bivariate normal distribution), which is not always true even if every single variable is normal on its own. This is where the covariance matrix comes into play. Normal distribution is described by two variables: mean and standard deviation (sd), with the latter being simply the square root of the variance.
Covariance and covariance matrix
Covariance is a measure of the joint variability of two variables. The sign of the covariance shows the tendency, i.e. one of the three possible linear relationships between two variables, which are:
- negative linear relationship, covariance is < 0
- no linear relationship, covariance is around 0
- positive linear relationship, covariance is > 0
The variance is a special case of covariance in which the two variables are identical. Covariance itself is sensitive to the scale of the data, and it does not provide information of on the extent to which the two variables correlate with each other and how well the linear correlation fits the data. All these characteristics make covariance difficult to interpret, but it has become an important stepping stone for many subsequent analysis, such as PCA, correlation analysis, etc.
A covariance matrix is a square matrix giving the covariance between each pair of elements, which is always symmetric, and the diagonal elements represent the variance of each variable. The covariance matrix describes the joint behavior of every linear combination of two variables, analogue to univariate normal distribution, a covariance matrix (
Conjugate Prior
Jackson et al. (2011) suggested that the normal distribution is generally appropriate when dealing with stable isotope ratios, and this gives us a way to simplify the analysis by taking advantage of the conjugate prior.
A conjungate prior is a certain prior distribution that, when applied to a certain likelihood function, results in a posterior distribution from the same distribution family. For example, in univaraite analysis, a normal prior and a normal likelihood function results a normal posterior. In our case, the conjugate prior for the multivariate normal is the normal-inverse-wishart (NIW) distribution. It can be considered in two parts: the mean follows a normal distribution and the covariance follows an inverse-Wishart distribution. NIW is commonly used as the prior distribution when modeling a multivariate normal distribution where the means and the covariances are unknown.
The probability of a random observation of a isotope ratio (
Monte Carlo with niw.post()
All the prior knowledge we just went through, from basic Bayesian inference to the NIW distribution, is to understand the very two lines of code that implemented the Bayesian workflow into the analysis.
<- 1000
nsamples <- tapply(1:nrow(fish), fish$species,
fish.par function(ii) niw.post(nsamples = nsamples, X = fish[ii,2:4]))
niw.post()
uses a data matrix (X
) as observation data, calculate the posterior distribution, and then draws a set of means and covariance matrices of a possible multivariate distribution, which includes the prior and our observation data. The main idea is that because analytically working with posterior multivariate distribution is too impractical, so we employ the to have a view of the posterior. We use tapply()
to separately apply 1000 times of niw.post()
on each fish species.
This results in a list of 4 lists (species) of 2 matrices (mu and Sigma), where the mu matrix has a dimension of 1000 (sampling times) x 3 (number of variables), and the Sigma matrix has a dimension of 1000 (sampling times) x 3 x 3 (one slice = one covariance matrix).
This particular data structure is critical for later application of niche.par.plot()
, see ?nichr.par.plot
for more.
Niche Parameters
niche.par.plot()
allows us to have a overview on Monte Carlo simulation of the posterior multivariate distribution of our plot.index
, plot.mu
or plot.Sigma
to select which plot.mu = TRUE, plot.Sigma = TRUE
to display all. See details in the document.
<- c('#348E8C', '#732C02', '#BF0404', '#F29F05') # colors for each species
clrs
# mu1 (del15N), mu2 (del13C), and Sigma12
par(mar = c(4, 5, .5, .1)+.1, oma = c(0,0,0,0), pty="s", mfrow = c(1,3))
niche.par.plot(fish.par, col = clrs, plot.index = 1)
niche.par.plot(fish.par, col = clrs, plot.index = 2)
niche.par.plot(fish.par, col = clrs, plot.index = 1:2)
legend("topleft", legend = names(fish.par), fill = clrs)
2-D Ellipse Projection
To deal with n-dimensional niche regions (NR), Swanson et al. (2015) reduced them into 2-dimensional space using probabilistic projection, which is to differentiate from geometric projection. A 2-dimensional probabilistic projection of an n-dimensional NR is simply the NR calculated by the projected 2 axes (Jackson et al. 2011; Swanson et al. 2015). Fig. 2 from Swanson et al. (2015) clearly demonstrates and explains the different outcomes.
For the sake of clarity, we only use 10 simulations here. niche.plot()
generates three types of plot:
- Univariate distribution density curves, i.e. smoothed histogram.
- Bivariate scatter plots.
- 2-D projection of NR using simulation data, displayed in ellipse plots.
From the first two plot types, we could confirm the assumption that the isotope ratios are generally normal. The ellipse plots give us a better view of each species’ NR and the overlap situation.
<- 10
nsamples <- tapply(1:nrow(fish), fish$species,
fish.par function(ii) niw.post(nsamples = nsamples, X = fish[ii,2:4]))
# format data for plotting function
<- tapply(1:nrow(fish), fish$species, function(ii) X = fish[ii,2:4])
fish.data
niche.plot(niche.par = fish.par, niche.data = fish.data, pfrac = .05,
iso.names = expression(delta^{15}*N, delta^{13}*C, delta^{34}*S),
col = clrs, xlab = expression("Isotope Ratio (per mil)"))
It’s a bit overwhelming plot. Take time to understand it.
Estimate Niche Overlap
With overlap()
we can have a more accurate and robust inference of overlapping among the species. The accuracy of the estimation can be improved by increasing nprob
. The generated plot shows pairwise niche overlaps (now we have directions!), which displays the probability distribution of species A (along rows) fallen inside the niche of species B (along columns), with indication of the mean and 95% credible interval.
Be careful not to confuse two “95%” here - one means that we use 95% of the niche region to calculate overlap, which is specified in overlap()
with alpha = 0.95
; the other is the 95% credible interval of the probability distribution displayed in each plot. There is a difference to spot between plots using 95% or 99% of the niche region.
<- 1000
nsamples <- tapply(1:nrow(fish), fish$species,
fish.par function(ii) niw.post(nsamples = nsamples, X = fish[ii,2:4]))
# 95% niche region
<- overlap(fish.par, nreps = nsamples, nprob = 1000, alpha = 0.95)
over.stat95 overlap.plot(over.stat95, col = clrs, mean.cred.col = "#024873", equal.axis = TRUE, xlab = "Overlap Probability (%) -- Niche Region Size: 95%")
# 99% niche region
<- overlap(fish.par, nreps = nsamples, nprob = 1000, alpha = 0.99)
over.stat99 overlap.plot(over.stat99, col = clrs, mean.cred.col = "#024873", equal.axis = TRUE, xlab = "Overlap Probability (%) -- Niche Region Size: 99%")
By printing out the returned value from overlap()
, numeric results can also be shown.
Penguins
Let’s run the niche overlap analysis on another dataset. There’s a datasets of 3 Pygoscelis species nesting along the Palmer Archipelago near Palmer Station, containing structural measurements, two isotope ratios, and basic metadata. Isotope data was collected from blood samples, and additional records of whether the sampled individuals have completed their clutch were documented.
<- readRDS("biology-research/community-ecology/assets/data/219.5.rds") %>% mutate(Species = "P. adeliae")
adeliae <- readRDS("biology-research/community-ecology/assets/data/220.7.rds") %>% mutate(Species = "P. papua")
papua <- readRDS("biology-research/community-ecology/assets/data/221.8.rds") %>% mutate(Species = "P. antarcticus")
antarcticus
<- bind_rows(adeliae, papua, antarcticus)
penguin
<- penguin %>% select(c(1:9, 13:14, 17, 10:12, 15:16)) %>% filter(complete.cases(.))
comp.penguin
paged_table(comp.penguin)
penguin
dataset
studyName <fct> | Sample.Number <int> | Species <chr> | Region <fct> | Island <fct> | Stage <fct> | Individual.ID <fct> | Clutch.Completion <fct> | Date.Egg <date> | Body.Mass <int> | |
---|---|---|---|---|---|---|---|---|---|---|
PAL0708 | 2 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N1A2 | Yes | 2007-11-11 | 3800 | |
PAL0708 | 3 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N2A1 | Yes | 2007-11-16 | 3250 | |
PAL0708 | 5 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N3A1 | Yes | 2007-11-16 | 3450 | |
PAL0708 | 6 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N3A2 | Yes | 2007-11-16 | 3650 | |
PAL0708 | 7 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N4A1 | No | 2007-11-15 | 3625 | |
PAL0708 | 8 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N4A2 | No | 2007-11-15 | 4675 | |
PAL0708 | 10 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N5A2 | Yes | 2007-11-09 | 4250 | |
PAL0708 | 11 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N6A1 | Yes | 2007-11-09 | 3300 | |
PAL0708 | 15 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N8A1 | Yes | 2007-11-16 | 4400 | |
PAL0708 | 17 | P. adeliae | Anvers | Torgersen | Adult, 1 Egg Stage | N9A1 | Yes | 2007-11-12 | 3450 |
To consider potential sexual dimorphisms within species, we will also differentiate the observations by sex.
<- comp.penguin %>%
comp.penguin filter(Sex != "" & Sex != ".") %>%
mutate(Species = factor(Species), Sex = factor(case_when(
== "FEMALE" ~ "F",
Sex == "MALE" ~ "M"
Sex %>%
))) mutate(cat = factor(paste(Species, Sex, sep = " - ")))
# Take a look at the n in each group
summary(comp.penguin$cat)
P. adeliae - F P. adeliae - M P. antarcticus - F P. antarcticus - M
71 68 34 33
P. papua - F P. papua - M
58 60
As mentioned in Table 2,
Code
# Run the analysis only with isotope data
<- 10
nsamples <- tapply(
pen.par 1:nrow(comp.penguin),
$cat,
comp.penguinfunction(ii) niw.post(nsamples = nsamples, X = comp.penguin[ii, 16:17]))
# format data for plotting function
<- tapply(1:nrow(comp.penguin), comp.penguin$cat, function(ii) X = comp.penguin[ii, 16:17])
pen.data
<- c("#023059", "#0477BF", "#8C3E7F", "#8872A6", "#F24130", "#F2A71B")
clrs
par(mfcol = (c(6, 6)))
niche.plot(niche.par = pen.par, niche.data = pen.data, pfrac = .05,
iso.names = c(expression(delta^{15}*N), expression(delta^{13}*C)),
col = clrs, xlab = expression("Isotope Ratio (per mil)"))
<- 1000
nsamples <- tapply(
pen.par 1:nrow(comp.penguin),
$cat,
comp.penguinfunction(ii) niw.post(nsamples = nsamples, X = comp.penguin[ii, 16:17]))
# 95% niche region
<- overlap(pen.par, nreps = nsamples, nprob = 1000, alpha = 0.95)
over.stat95 overlap.plot(over.stat95, col = clrs, mean.cred.col = "#024873", equal.axis = TRUE, xlab = "Overlap Probability (%) -- Niche Region Size: 95%", species.names = c("P. adeliae - F", "P.adeliae - M", "P. antarcticus - F", "P. antarcticus - M", "P. papua - F", "P. papua - M"))
In Figure 7 we saw no remarkable differences between sexes within each species. We also noticed that P. adeliae has the largest niche region and it overlaps with both the others. This is also shown in Figure 8.
Then, we combine isotope data with two structural measurements, culmen length and culmen depth, whose functionality should be strongly associated with their diet, to run the analysis again.
Code
# Run the analysis with culmen size data and isotope data
<- 10
nsamples <- tapply(
pen.par 1:nrow(comp.penguin),
$cat,
comp.penguinfunction(ii) niw.post(nsamples = nsamples, X = comp.penguin[ii, c(13:14, 16:17)]))
# format data for plotting function
<- tapply(1:nrow(comp.penguin), comp.penguin$cat, function(ii) X = comp.penguin[ii, c(13:14, 16:17)])
pen.data
<- c("#023059", "#0477BF", "#8C3E7F", "#8872A6", "#F24130", "#F2A71B")
clrs
par(mfcol = (c(6, 6)))
niche.plot(niche.par = pen.par, niche.data = pen.data, pfrac = .05,
iso.names = c("Culmen length", "Culmen depth", expression(delta^{15}*N), expression(delta^{13}*C)),
col = clrs, xlab = expression("Isotope Ratio (per mil)"))
<- 1000
nsamples <- tapply(
pen.par 1:nrow(comp.penguin),
$cat,
comp.penguinfunction(ii) niw.post(nsamples = nsamples, X = comp.penguin[ii, c(13:14, 16:17)]))
# 95% niche region
<- overlap(pen.par, nreps = nsamples, nprob = 1000, alpha = 0.95)
over.stat95 overlap.plot(over.stat95, col = clrs, mean.cred.col = "#024873", equal.axis = TRUE, xlab = "Overlap Probability (%) -- Niche Region Size: 95%", species.names = c("P. adeliae - F", "P.adeliae - M", "P. antarcticus - F", "P. antarcticus - M", "P. papua - F", "P. papua - M"))
In Figure 9 we notice that both in culmen length and depth, a notable sexual dimorphism is present, and this has formed different niche regions among the groups. The overlap in niche region defined by this data subset (Figure 10) shows, however, different results than those which only utilized isotope signatures (Figure 8). The overlap in all groups and directions between P. papua and P. adeliae is gone, suggesting that their diets have similar
On the other hand, female individuals of P. antarcticus remain the only group which has a notable overlap with other groups, namely both sexes of P. adeliae. Gorman, Williams, and Fraser (2014), who conducted the study utilizing this data, suggested, regarding their
<- comp.penguin %>% group_by(Species) %>% summarise(complete_clutch = round(sum(Clutch.Completion == "Yes")/sum(!is.na(Clutch.Completion))*100, 2))
clutch clutch
# A tibble: 3 × 2
Species complete_clutch
<fct> <dbl>
1 P. adeliae 90.6
2 P. antarcticus 79.1
3 P. papua 94.1
And at last, these are our adorable guests:
By the way, if anyone ever wondered – yes, this is the exact same dataset they used to produce the palmerpenguins
package. I found the data at EDI Data Portal and didn’t notice until I attached the images, because these three guys showing up at the same time is weirdly familiar to me. The dataset is therefore available by simply attaching the package. If you are interested in what more detailed results people have found with these data, you should check the paper from Gorman, Williams, and Fraser (2014).
Reference
Citation
@online{hu2024,
author = {Hu, Zhehao},
title = {Niche {Overlap} {Analysis} {Using} {nicheROVER} and {Stable}
{Isotope} {Data}},
date = {2024-09-18},
url = {https://zzzhehao.github.io/biology-research/community-ecology/nicheROVER-Niche-Overlap.html},
langid = {en},
abstract = {Stable isotope analysis (SIA) provides numerous
information while assessing species’ ecological niche, the R package
*nicheROVER* offers an insightful analysis using stable isotope
data.}
}