Descriptive Community Ecology using NMDS

Multivariate Analysis

Mediterranean Micobiome

Author

Affiliation

Zhehao Hu

Department of Biology, University of Hamburg

Published

July 25, 2025

Abstract

This is the first post of the Mediterranean Microbial Network series.

This is a meta-analysis for demonstration in M. Sc. Biology module Network Analysis in Ecology and Beyond offered by Prof. Dr. Jochen Fründ at University of Hamburg.
The data used is credited to Deutschmann et al. (2024) and available in the original repo, as well as in our repo.
Data wrangling was done by Jan Moritz Fehr.

Ocean Microbiome

We adopted the ocean microbial data processed and published by Deutschmann et al. (2024), and focused on the Mediterranean subset. Thank to the comprehensive record of ecological parameters, we were able to divide the dataset according to their depth. The data was originally categorized in 5 layers:

Surface (SRF): 0 - 3 m
Epipelagic (EPI): 12 - 50 m
Deep-Chlorophyll Maximum (DCM): 40 - 130 m
Mesopelagic (MES): 200 - 1000 m
Bathypelagic (BAT): 1100 - 3300 m

We have combined EPI and DCM for the sake of simplicity and also because DCM is recognized as part of EPI. EPI is also called as sunlit zone, where solar resource could still be utilized by photosynthesis. As the red light being absorbed faster than the blue light, in MES, also known as twilight zone, there’s only dim blue light present, which can not be used for photosynthesis. The light will eventually disappear in BAT, at which depth the most area of Mediterranean Sea hits the sea floor.

In each layer, the species composition besides microorganism is different: in SRF and EPI, phytoplanktons are able to import solar energy into the ecosystem and this attracts small zooplanktons to feed at this depth. As photosynthesis becomes no longer possible in MES, it is home to many nektons: fishes, squids, planktonic crustaceans, etc. In BAT, as we approach the sea floor, benthic fauna becomes an important part. Such differences in interaction partners will also affect microorganism communities. But in what extent? Or how are the communities different? The co-occurrence / association data of the microorganism can help us answer those questions.

Deutschmann et al. (2024) published an Amplicon Sequences Variants (ASV) abundance data, which comprises an abundance matrix in the form that each row represents a sample site and each column represents an OTU (Operational Taxonomic Unit, commonly used in microbiology as proxies for species concept). This matrix directly represents an adjacency matrix for a species-location bipartite network.

For more information about ocean layers, kindly see Know Your Ocean from WHOI.

R Setup

library(bipartite)
library(tidyverse)
library(vegan)
library(geomtextpath)
library(dbscan)

path <- "MediterraneanMicrobiome/"

Lets get the data imported and start with some basic data wrangling:

ASV <- read.delim(paste0(path, "data/ASV.txt"), sep = "\t", header = TRUE)
# ASV = abundance data of OTUs at sampling locations (OTU x location)
ENV <- read.delim(paste0(path, "data/ENV.txt"), sep = "\t", header = TRUE)
# ENV = environmental data of sampling locatoins
LOC <- read.delim(paste0(path, "data/Sample_Classification.tsv"), sep ="\t", header = T)
# LOC = Location data for sampling locations eg. depth / ocean layer

Data wrangling

dfASV <- as.data.frame(t(ASV))
tENV <- as.data.frame(t(ENV))
colnames(dfASV) <- ASV$ID 
dfASV <- dfASV[-1, ]

dfASV <- dfASV %>% 
  mutate_at(1:5457, as.numeric) %>% # turning abundance data into numeric
  mutate(Sample=row.names(dfASV))  # adding column with sampling location names 
  # rename(dfASV$name, Sample = name) # naming this column "Sample"

colnames(tENV) <- ENV$ID # renaming the columns after the ID column in the ENV dataset. 
tENV <- tENV[-1, ] # deleting the ID row in the tENV dataframe
tENV <- tENV %>%
  mutate(Sample=row.names(tENV)) # adding column with sampling location names
  # rename(tENV$name, Sample = name) # naming this column "Sample"

new <- left_join(dfASV, LOC, by = NULL) # joining ASV and LOC dataframes
new <- left_join(new, tENV, by = NULL)

Subsetting for mediterranean sample locations:

new.MS <- new[new[, "OceanRegion"] == "MS", ] # subsetting for samples from Mediterranean sea (MS)
sp.MS <- new.MS[, 1:5457] # subsetting only the species abundance part of the MS dataframe
sp.MS <- sp.MS %>% 
  mutate_at(1:5457, as.numeric) # tuning df into numeric data.
sp.MS <- sp.MS[, colSums(sp.MS) > 0] # only keeping columns where the sum is > 0. therefore getting rid off all OTUs not found in mediterranean sea

sp.MS <- sp.MS %>%
  mutate(Sample=new.MS$Sample) 
  # rename(sp.MS$name, Sample = name)

ENV.MS <- new.MS[, 5458:5473] # subsetting only location and environmental data of sampling locations in the MS

MS <- left_join( # rejoining species abundance and location / environmental data of MS
  sp.MS, 
  ENV.MS, 
  by = NULL
)

ab.mtx.ncol <- 3208

MS is now a dataframe comprising n = 145 rows as sample sites and p = 3208 columns of OTU, which stores the abundance data in the form of an adjacency matrix. There are additional 16 columns at the end for ecological data.

Non-metric Multi-Dimensional Scaling (NMDS)

The most intuitive way to have an overview on the community is to plot them out, NMDS is the most common way to do this.

library(tidyverse)
library(vegan)

set.seed(1)

ab.mtx <- as.matrix(MS[, 1:ab.mtx.ncol]) # extract abundance matrix

Data Preparation

Taking a brief look at the data, it’s very clear that out microbial data is actually largely skewed. There are a few species with extremely high reads, and a vast majority being very low frequent, and almost 85% of the cells in our matrix are zero.

ab.mtx[ab.mtx == 0] %>% length() / ab.mtx %>% length()

[1] 0.8453521

max(ab.mtx)

[1] 59282

hist(ab.mtx)

It’s worth noting that vegan::metaMDS(), the function that does the calculation for NMDS, will try to autotransform the data if not specified autotransform = False. In our case, considering the extreme skewness of the data and clear pattern, which transforming method to use did not really make big differences. But this is not always true. One should select these methods cautiously. Hellinger transformation is one way to reduce the weight of extremely high value.

ab.mtx.tf <- decostand(ab.mtx, "hellinger")
hist(ab.mtx.tf)

Perform NMDS

Every community ecologist should be familiar with the basic idea of NMDS, which is dimension reduction. It converts our n x p matrix into an n x n dissimilarity matrix. This steps allows flexible decision on how the distances should be calculated. A common choice while dealing with species composition is the Bray-Curtis dissimilarity, as its semi-metric approach takes both presence/absence and abundance information into account. It is also default to metaMDS().

NMDS differs a bit from other ordination methods, like PCA, in handling the distances and calculating axes and coordinates. Its non-metric nature decided that the distance on the final plot does not represents the distances between dataset, but the ranking order. It also therefore, does not try to maximize the variability in axes, which means the axes are completely arbitrary and this technique is mainly used for visualization only (Bakker 2024).

Before starting NMDS, we have to specify k, which gives the number of axes we want to have. NMDS will then take iterative process to adjust the scaling. After each iteration, it calculates a stress values showing how nice the ordination represents real data. The value ranges from 0 to 1, where 0 means a perfect fit. It however indicates danger for interpreting the ordination already from 0.2. Generally a stress value under 0.1 is considered a good result (Clarke 1993; Bakker 2024).

NMDS <- metaMDS(ab.mtx.tf, k=2, autotransform = F, trymax = 50)

See the processing result

Run 0 stress 0.07999079 
Run 1 stress 0.08285436 
Run 2 stress 0.07999154 
... Procrustes: rmse 0.0001703311  max resid 0.001452497 
... Similar to previous best
Run 3 stress 0.07999155 
... Procrustes: rmse 0.0001706899  max resid 0.00145164 
... Similar to previous best
Run 4 stress 0.07999154 
... Procrustes: rmse 0.0001700711  max resid 0.001453166 
... Similar to previous best
Run 5 stress 0.08285435 
Run 6 stress 0.08285435 
Run 7 stress 0.08285435 
Run 8 stress 0.08285435 
Run 9 stress 0.07999154 
... Procrustes: rmse 0.0001701321  max resid 0.00145255 
... Similar to previous best
Run 10 stress 0.07999154 
... Procrustes: rmse 0.0001702127  max resid 0.001452927 
... Similar to previous best
Run 11 stress 0.08285436 
Run 12 stress 0.08285435 
Run 13 stress 0.07999079 
... New best solution
... Procrustes: rmse 8.647632e-06  max resid 3.669006e-05 
... Similar to previous best
Run 14 stress 0.07999154 
... Procrustes: rmse 0.0001693804  max resid 0.001455979 
... Similar to previous best
Run 15 stress 0.07999079 
... New best solution
... Procrustes: rmse 3.175661e-06  max resid 1.566341e-05 
... Similar to previous best
Run 16 stress 0.07999155 
... Procrustes: rmse 0.0001701166  max resid 0.001452896 
... Similar to previous best
Run 17 stress 0.08285435 
Run 18 stress 0.07999078 
... New best solution
... Procrustes: rmse 5.14941e-06  max resid 2.555982e-05 
... Similar to previous best
Run 19 stress 0.08285435 
Run 20 stress 0.08285435 
*** Best solution repeated 1 times

NMDS


Call:
metaMDS(comm = ab.mtx.tf, k = 2, trymax = 50, autotransform = F) 

global Multidimensional Scaling using monoMDS

Data:     ab.mtx.tf 
Distance: bray 

Dimensions: 2 
Stress:     0.07999078 
Stress type 1, weak ties
Best solution was repeated 1 time in 20 tries
The best solution was from try 18 (random start)
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on 'ab.mtx.tf'

NMDS Plot using `{ggplot2}`

There are several plotting functions from {vegan}, but since we were trying to make a fancy poster, I brought out my old script to make fully controlled plot with {ggplot2}.

# Extract coordinates
siteCoord <- as_tibble(scores(NMDS)$sites)
spCoord <- as_tibble(scores(NMDS)$species)

# Import aesthetics setting
aes_setting <- function() {
    list(
        theme_minimal() +
        theme(
            aspect.ratio = 1,
            plot.title = element_text(size = 12, face = "bold", family = "Times New Roman", hjust = 0.5),
            axis.ticks.length = unit(-0.05, "in"),
            axis.text.y = element_text(margin=unit(c(0.3,0.3,0.3,0.3), "cm")),
            axis.text.x = element_text(margin=unit(c(0.3,0.3,0.3,0.3), "cm")),
            axis.ticks.x = element_blank(),
            plot.background = element_blank(),
            legend.background = element_blank(),
            panel.background = element_rect(color = NULL, fill = "white"),
            panel.border = element_rect(color = "black", fill = "transparent"),
            panel.grid = element_line(linewidth = 0.3)
        )
    )
}

# Plot the species
p.NMDS.sp <- ggplot() +
    geom_point(data = spCoord, aes(x = NMDS1, y = NMDS2), size = 2, shape = 3, color = "grey", alpha = 0.25) +
    aes_setting() +
    theme(aspect.ratio = 1) +
    coord_equal()

p.NMDS.sp

As simple as that, we have our species plotted. But we are not really interested in individual species, instead, the community at different sites. For that purpose, we have to first pass the environmental data of each site into siteCooord.

# Extract environmental data, combine DCM with EPI
env <- MS[, (ab.mtx.ncol+1):ncol(MS)] %>% 
    mutate(layer = case_when(layer == "DCM" ~ "EPI", .default = layer))
siteCoord_env <- cbind(siteCoord, env)

# Order the layer 
siteCoord_env_ordered <- siteCoord_env %>% mutate(layer = factor(layer, levels = c("SRF", "EPI", "MES", "BAT")))

Then we can plot them as usual:

# Define colors for the layers
depth.colors <- c(
    "SRF" = "#6CCCD6",
    "EPI" = "#007896", 
    "MES" = "#004359",
    "BAT" = "#001C25"
)

depth.shapes <- c(
    "SRF" = 15,
    "EPI" = 16, 
    "MES" = 17,
    "BAT" = 18
)

p.NMDS.st <- ggplot() +
    geom_point(data = siteCoord_env_ordered, aes(x = NMDS1, y = NMDS2, color = layer, shape = layer), size = 3.5, alpha = 0.8) +
    aes_setting() +
    theme(aspect.ratio = 1) +
    scale_color_manual(values = depth.colors) +
    scale_shape_manual(values = depth.shapes) +
    guides(color = guide_legend("Layer"), fill = "none", shape = guide_legend("Layer"))

p.NMDS.st

Now, this is telling us more information, we can see that sample sites from the same layer tend to cluster together, where MES and BAT form a big group distinct from EPI and SRF. A common way to show such categorical data in NMDS is to draw a convex hull on those clusters.

NMDS Hullplot using `{ggplot2}`

The basic idea is simply use chull() to identify the points from the same cluster that are on the border of the hull and draw a polygon using those points. But the problem is that we have some outliers here, especially from MES (those which neighboring BAT and SRF). These points will surely become the one defining the convex hull, and make the hull covering much more area than the cluster really does.

To address this, I additionally calculate the euclidean centroids of each cluster and filter out outliers by comparing their distances to the centroids with the median distances to the centroids from all points within the same cluster. Those points that has a distance larger than a threshold calculated from median distance with defined multiplier will not be included.

See the result if we do not exclude those points

# Calculate the coordinates of the hull border
outliner <- function(coord, group_var,  threshold_factor = 2, threshold_func = median) {
    centroid <- colMeans(coord[, c(1, 2)])
    centroids <- matrix(c(rep(centroid[1], nrow(coord)), rep(centroid[2], nrow(coord))), ncol = 2)
    
    # Euclidean distances
    coord$distances <- sqrt(rowSums((coord[, c(1, 2)] - centroids)^2))

    # Set threshold
    threshold <- threshold_factor * threshold_func(coord$distances)

    # filter out points with distance larger than threshold
    coord_in <- coord[coord$distances <= threshold, ]

    outline_indices <- chull(coord_in[, c(1, 2)])

    return(list(
        borders = coord_in[outline_indices, ], 
        centroid = c(centroid, group = group_var[[1]])
    ))
}

threshold_factor controls how conservative or inclusive the convex hull should be drawn. Then we apply this process to each category and extract the hull border coordinatees for ggplot. threshold_func defines which function to use for choosing baseline of the threshold. Default is set to median() here since mean() can skew the baseline if any extreme outlier exists.

# Using ordered factor from `siteCoord_env_ordered` can't successfully pass the layer name through the pipeline
layer.siteCoord <- siteCoord_env %>% 
    group_by(layer) %>%
    group_map(~ outliner(.x, .$layer), .keep = T) %>% 
    `names<-`(group_keys(siteCoord_env %>% group_by(layer))[[1]]) %>%
    do.call(what = rbind, args = .) %>%
    as.data.frame()

# Extract border and centroid coordinates
borders.pCoord <- layer.siteCoord[["borders"]] %>% bind_rows()
centroids.pCoord <- layer.siteCoord[["centroid"]] %>% bind_rows() %>% mutate(NMDS1 = as.numeric(NMDS1), NMDS2 = as.numeric(NMDS2))

p.NMDS.st.hull <- p.NMDS.st +
    geom_polygon(data = borders.pCoord, aes(x = NMDS1, y = NMDS2, fill = layer), alpha = 0.5) + # hull
    geom_text(data = centroids.pCoord, aes(x = NMDS1, y = NMDS2, label = group)) +
    scale_fill_manual(values = depth.colors) 

p.NMDS.st.hull

Now, each layer forms a cluster that does not have much overlap with others, except EPI and SRF, which also make sense, because even microorganism in SRF might have different interacting partners than the rest of EPI, the environmental condition and the general species compositions, in particular functionally, should be similar. This is also reflected by SRF on the right side of EPI with larger distance to MES or BAT.

The space between MES and EPI is not encapsulated by any of both, it reflects the “bridge” section, probably at which depth that both layers meet each other. There’s a numerical depth data in env, which is principally continuous data complementing our layer categories. While we use hull plot to visualize categeories in NMDS, we can use contour line to visualize continuous variables.

Fitting smooth surface

If we imagine the variables we want to display as the third dimension of our points on NMDS, visualizing the variables becomes simply visualizing a 3-D structure on our 2-D plot. One of the solution is to fit the surface on our plot using vegan::ordisurf().

The way to extract contour line from vegan::ordisurf() and the function extract.xyz() are credited to Christopher Chizinski’s post.

library(geomtextpath)
extract.xyz <- function(obj) {
    print(obj)
  xy <- expand.grid(x = obj$grid$x, y = obj$grid$y)
  xyz <- cbind(xy, c(obj$grid$z))
  names(xyz) <- c("x", "y", "z")
  return(xyz)
}

depth.surf <- ordisurf(NMDS, env$depth, permutation = 999, na.rm = T, plot = F)
depth.contour <- extract.xyz(obj = depth.surf)


Family: gaussian 
Link function: identity 

Formula:
y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)

Estimated degrees of freedom:
7.84  total = 8.84 

REML score: 1057.237

p.NMDS.st.hull.cont <- p.NMDS.st.hull +
    geom_textcontour(data = depth.contour, aes(x, y, z = z), color = "#c4438c") 

p.NMDS.st.hull.cont

We see those points on the MES/EPI border could be perfectly explained by the depth.

Detecting Density-based Cluster with `{dbscan}`

Another interesting way to analyse the community structure more objectively is to cluster the points purely based on their location on NMDS, without taking the environmental parameters into account. Now speaking of cluster, you might be thinking about drawing another convex shape, probably an oval or a circle around the data points. This is however not applicable in our case, and usually in many real-life cases: out data points in the NMDS space clearly from clusters whose shapes are irregular. To address this issue, we use a density-based clustering algorithm called DBSCAN. The main idea of a density-based cluster is that the shape of the cluster is arbitrary, and it is defined by a continuous space with high density of data points, which suggests they are similar to each other, and clusters are separated by low-density region in between. This method can effectively filter out the outliners and noises, in a more robust way than just setting threshold on the distance to the centroid. And also, while the clusters are now in arbitrary shapes, the centroids become meaningless too.

The main idea of detecting high density area is to categorize data points by the distance to their neighbors into core points and border points. Contiguous connection (neighboring) of core points represents the contiguous dense area, where border points identify the periphere. All interconnected core points and border points are the same cluster, points that are not connected to any core point are noises. You can find more details about the algorithm and the package in their vignettes and a nice graphic introduction by StatQuest.

library(dbscan)

?dbscan()

We need to specify two parameters for DBSCAN, which affect how we identify the core points: the minimum points (including the point itself) required in a point’s epsilon neighborhood for becoming a core point (minPts) and the size of the epsilon neighborhood (eps). Bigger the minPts the more difficult for a point to become a core point, a rule of thumb is to choose the number of axis we have + 1, which will be 3 in our case. Choice of eps value, however, depends on how scattered the points are. The bigger the eps, the easier for a point to reach the requirement we set with minPts to become a core point, and vice versa. To find an eps value that is not too high or too low, we can have a peak into how the distances between points look like:

kNNdistplot(siteCoord_env_ordered[,1:2], k = 2)

The plot shows every point’s distance to their k-th nearest neighbor (kNN), ordered ascending. The reason we are looking at the distances to their kNNs (2nd in our case) is because our minPts will be k + 1, so the distances shown are the maximum distances the points’ epsilon neighborhoods need to cover for becoming core points. What we want to identify is the knee of the curve, which is around y = 0.2, and it will be our eps value. Set the eps lower, we will lose a lot potential core points and prevent the nearby high dense areas connect with each other to form bigger cluster, this can be useful when we want to distinguish clusters stricter; if we set the eps at higher value, we will identify too much core points that eventually connect all clusters together as one.

kNNdistplot(siteCoord_env_ordered[,1:2], k = 2)
abline(h = 0.2, col = "red")

With parameters set, we simply

siteCoord_scan <- dbscan(siteCoord_env_ordered[,1:2], eps = 0.2, minPts = 3)

siteCoord_scan

DBSCAN clustering for 145 objects.
Parameters: eps = 0.2, minPts = 3
Using euclidean distances and borderpoints = TRUE
The clustering contains 3 cluster(s) and 18 noise points.

 0  1  2  3 
18 72 52  3 

Available fields: cluster, eps, minPts, metric, borderPoints

siteCoord_env_scan <- siteCoord_env %>% mutate(dbcluster = factor(siteCoord_scan$cluster))

ggplot() +
    geom_point(data = siteCoord_env_scan, aes(x = NMDS1, y = NMDS2, color = dbcluster, shape = layer), size = 3.5, alpha = 0.8) +
    scale_color_manual(values = c("#CCC", paletteer::paletteer_d("ltc::trio1")) %>% `names<-`(c(0:3))) +
    scale_shape_manual(values = depth.shapes) +
    aes_setting() +
    labs(shape = "Layer") +
    guides(color = "none") +
    theme(
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        legend.position = "bottom",
        legend.direction = "horizontal",
        legend.box = "vertical"
    )

But, what we did so far was basic descriptive work in community ecology. We still want to take a look into each community structure using network analysis. I will cover this in the next upcoming post. To be continued.

Reference

Bakker, Jonathan D. 2024. Applied Multivariate Statistics in R. University of Washington. https://uw.pressbooks.pub/appliedmultivariatestatistics/.

Clarke, K. R. 1993. “Non-Parametric Multivariate Analyses of Changes in Community Structure.” Australian Journal of Ecology 18 (1): 117–43. https://doi.org/10.1111/j.1442-9993.1993.tb00438.x.

Deutschmann, Ina M., Erwan Delage, Caterina R. Giner, Marta Sebastián, Julie Poulain, Javier Arístegui, Carlos M. Duarte, et al. 2024. “Disentangling Microbial Networks Across Pelagic Zones in the Tropical and Subtropical Global Ocean.” Nature Communications 15 (1): 126. https://doi.org/10.1038/s41467-023-44550-y.

R session and R packages

R session

R version 4.5.0 (2025-04-11) Platform: aarch64-apple-darwin20 Running under: macOS Sequoia 15.5

Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] dbscan_1.2.2 geomtextpath_0.1.4 bipartite_3.0.0.1
[4] sna_2.8 network_1.19.0 statnet.common_4.10.0 [7] vegan_2.6-8 lattice_0.22-6 permute_0.9-7
[10] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[13] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[16] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.2
[19] tidyverse_2.0.0 knitr_1.48 rmarkdown_2.28
[22] pacman_0.5.1

Packages

bipartite

Version: 3.0.0.1

Dormann C, Fruend J, Bluethgen N, Gruber B (2009). “Indices, graphs and null models: analyzing bipartite ecological networks.” The Open Ecology Journal, 2, 7-24.

Dormann C, Gruber B, Fruend J (2008). “Introducing the bipartite Package: Analysing Ecological Networks.” R News, 8(2), 8-11.

Dormann C (2011). “How to be a specialist? Quantifying specialisation in pollination networks.” Network Biology, 1(1), 1-20.

dbscan

Version: 1.2.2

Hahsler M, Piekenbrock M (2025). dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms. doi:10.32614/CRAN.package.dbscan https://doi.org/10.32614/CRAN.package.dbscan, R package version 1.2.2, https://CRAN.R-project.org/package=dbscan.

Hahsler M, Piekenbrock M, Doran D (2019). “dbscan: Fast Density-Based Clustering with R.” Journal of Statistical Software, 91(1), 1-30. doi:10.18637/jss.v091.i01 https://doi.org/10.18637/jss.v091.i01.

geomtextpath

Version: 0.1.4

Cameron A, van den Brand T (2024). geomtextpath: Curved Text in ‘ggplot2’. doi:10.32614/CRAN.package.geomtextpath https://doi.org/10.32614/CRAN.package.geomtextpath, R package version 0.1.4, https://CRAN.R-project.org/package=geomtextpath.

knitr

Version: 1.48

Xie Y (2024). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.48, https://yihui.org/knitr/.

Xie Y (2015). Dynamic Documents with R and knitr, 2nd edition. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963, https://yihui.org/knitr/.

Xie Y (2014). “knitr: A Comprehensive Tool for Reproducible Research in R.” In Stodden V, Leisch F, Peng RD (eds.), Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595.

pacman

Version: 0.5.1

Rinker TW, Kurkiewicz D (2018). pacman: Package Management for R. version 0.5.0, http://github.com/trinker/pacman.

rmarkdown

Version: 2.28

Allaire J, Xie Y, Dervieux C, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2024). rmarkdown: Dynamic Documents for R. R package version 2.28, https://github.com/rstudio/rmarkdown.

Xie Y, Allaire J, Grolemund G (2018). R Markdown: The Definitive Guide. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9781138359338, https://bookdown.org/yihui/rmarkdown.

Xie Y, Dervieux C, Riederer E (2020). R Markdown Cookbook. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9780367563837, https://bookdown.org/yihui/rmarkdown-cookbook.

tidyverse

Version: 2.0.0

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

vegan

Version: 2.6.8

Oksanen J, Simpson G, Blanchet F, Kindt R, Legendre P, Minchin P, O’Hara R, Solymos P, Stevens M, Szoecs E, Wagner H, Barbour M, Bedward M, Bolker B, Borcard D, Carvalho G, Chirico M, De Caceres M, Durand S, Evangelista H, FitzJohn R, Friendly M, Furneaux B, Hannigan G, Hill M, Lahti L, McGlinn D, Ouellette M, Ribeiro Cunha E, Smith T, Stier A, Ter Braak C, Weedon J (2024). vegan: Community Ecology Package. doi:10.32614/CRAN.package.vegan https://doi.org/10.32614/CRAN.package.vegan, R package version 2.6-8, https://CRAN.R-project.org/package=vegan.

Citation

BibTeX citation:

@online{hu2025,
  author = {Hu, Zhehao},
  title = {Descriptive {Community} {Ecology} Using {NMDS}},
  date = {2025-07-25},
  url = {https://zzzhehao.github.io/biology-research/community-ecology/MediMicro-NMDS.html},
  langid = {en},
  abstract = {This is the first post of the Mediterranean Microbial
    Network series.}
}

For attribution, please cite this work as:

Hu, Zhehao. 2025. “Descriptive Community Ecology Using NMDS.” July 25, 2025. https://zzzhehao.github.io/biology-research/community-ecology/MediMicro-NMDS.html.

Ocean Microbiome

Non-metric Multi-Dimensional Scaling (NMDS)

Data Preparation

Perform NMDS

NMDS Plot using {ggplot2}

NMDS Hullplot using {ggplot2}

Fitting smooth surface

Detecting Density-based Cluster with {dbscan}

Reference

R session

Packages

bipartite

dbscan

geomtextpath

knitr

pacman

rmarkdown

tidyverse

vegan

Citation

NMDS Plot using `{ggplot2}`

NMDS Hullplot using `{ggplot2}`

Detecting Density-based Cluster with `{dbscan}`