Description
My colleague @mavolio and I have noticed that the betadisper function can give a result that differs from Primer-e. This situation arises when the input distance matrix violates the triangle inequality, for example:
> df <- read.csv(text = '
Group, SpA, SpB
1, 0, 2
1, 1, 2
1, 1, 0')
> dst <- vegdist(df[,-1], 'bray')
> dst
1 2
2 0.2
3 1.0 0.5
The "distance" from 1 to 3 is 1.0, but it's shorter to go the 0.2 to 2 and then 0.5 to 3. The distances to the group centroid that betadisper
returns are all real values, due to the abs
call in L130.
> dsp <- betadisper(dst, df$Group, type = 'centroid')
> dsp$distances
[1] 0.4509250 0.2160247 0.5228129
Without that abs
, the result would appear to be meaningless because it is complex valued:
> dsp$distances
[1] 0.4509250+0.0000000i 0.0000000+0.2160247i 0.5228129+0.0000000i
It's not clear to us that a correct resolution of this situation is described in the literature anywhere. We think the result returned by betadisper may be faulty for two reasons:
- It's not what Primer-e returns. Primer-e returns only the real part of the complex array above.
- The sum of squared distances (0.523) does not equal the sum of squared inter-point distances divided by the number of points (0.43).
The second point arises from our reading of page 17 of the Primer-e manual. The sum of squares of the complex array above (0.43+0i) does pass the test, so despite it being complex may be the correct answer. We currently plan, in an upcoming patch to the NCEAS/codyn package, to return NA for centroid distances that fall into this category, because we don't feel the topic has passed through peer-review. How did the vegan developers settle on the current approach of taking an absolute value?