sim9(speciesData, algo, metric, nReps = 1000, saveSeed = FALSE, burn_in = 0, algoOpts = list(), metricOpts = list(), suppressProg = TRUE)
An improved implementation of the sequential swap algorithm.
Generating a set of random matrices with fixed row and column sums is a challenging computational problem. In the ecological literature, these matrices have been created by an MCMC "sequential swap" algorithm (Gotelli 2000). Two rows are two columns are chosen randomly ,and if the 4 cells form a 01/10 pattern, the cell values can be swapped to 10/01 and then replaced in the matrix. This generates a slightly different matrix with the same row and column totals. If the cells cannot be swapped, the trial is discarded. Because only 4 cells are reshuffled, it takes many successive swaps to eliminate transient effects as the matrix moves away from the original configuration and approaches a stationary distribution. A second disadvantage of the sequential swap is that all matrices are not sampled equiprobably because the failed swaps are discarded. This bias seems small for binary matrices that are typically generated by ecological studies (< 100 x 100), but could be important for "big data" applications.
EcoSimR uses an unbiased and more efficient algorithm, which Strona et al. (2014) have recently dubbed the "curveball algorithm". In this algorithm, two rows from the matrix are randomly chosen to create a submatrix. Within the submatrix, columns in which the column sums are equal to zero are randomly swapped. The resulting submatrix is then returned to the full matrix, with modified values in two of the rows. If no swapping is possible (which is an improbable event for most ecological matrices), the unswapped matrix is still retained. The curveball algorithm is much more efficient than the sequential swap because most iterations reshuffle many elements in the matrix simultaneously. Strona et al. (2014) show empirically that this algorithm gives unbiased results. However, the resulting MCMC chains will still exhibit autocorrelation for consecutive matrices, especially if the matrix is very large. Future versions of EcoSimR will allow for a thinning parameter to avoid using every sequential matrix from the MCMC chain. The current version of EcoSimR allows for control over the burn-in period and generates a burn-in plot so the user can see whether stationarity has been achieved.
Chen, Y., P. Diaconis, S.P. Holmes, and J.S. Liu. 2005. Sequential Monte Carlo methods for statistical analysis of tables. JASA 100: 109-120.
Cobb, G. W., and Chen, Y.-P. 2003. An Application of Markov Chain Monte Carlo to Community Ecology. American Mathematical Monthly 110: 265-288.
Gotelli, N.J. 2000. Null model analysis of species co-occurrence patterns. Ecology 81: 2606-2621.
Strona. G., D. Nappo, F. Boccacci, S. Fattorini, and J. San-Miguel-Ayanz. 2014. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nature Communications 5:4114 | DOI: 10.1038/ncomms5114.
## <strong>Not run</strong>: # ## Run the null model # finchMod <- cooc_null_model(dataWiFinches, algo="sim1",nReps=5000,burn_in = 500) # ## Summary and plot info # summary(finchMod) # plot(finchMod,type="burn_in") # plot(finchMod,type="hist") # plot(finchMod,type="cooc") # ## <strong>End(Not run)</strong>