Title: | Overlap Detection in n-Dimensional Space |
---|---|
Description: | Uses support vector machines to identify a perfectly separating hyperplane (linear or curvilinear) between two entities in high-dimensional space. If this plane exists, the entities do not overlap. Applications include overlap detection in morphological, resource or environmental dimensions. More details can be found in: Brown et al. (2020) <doi:10.1111/2041-210X.13363> . |
Authors: | Matilda Brown [aut, cre] , Greg Jordan [aut] |
Maintainer: | Matilda Brown <[email protected]> |
License: | GPL-3 |
Version: | 1.1.1 |
Built: | 2024-11-04 04:19:26 UTC |
Source: | https://github.com/matildabrown/hyperoverlap |
Uses support vector machines to identify a perfectly separating hyperplane (linear or curvilinear) between two entities in high-dimensional space. If this plane exists, the entities do not overlap. Applications include overlap detection in morphological, resource or environmental dimensions.
More details available in Brown et al. (2020) <doi:10.1111/2041-210X.13363> and vignette.
Matilda Brown [email protected]
Given a matrix containing the ecological data (x) and labels (y) for two entities, a support vector machine is trained and the predicted label of each point is evaluated. If every point has been classified correctly, the entities can be separated and they do not overlap.
hyperoverlap_detect(x, y, kernel = "polynomial", kernel.degree = 3, cost = 500, stoppage.threshold = 0.4, verbose = TRUE, set = FALSE)
hyperoverlap_detect(x, y, kernel = "polynomial", kernel.degree = 3, cost = 500, stoppage.threshold = 0.4, verbose = TRUE, set = FALSE)
x |
A matrix or data.frame containing the variables of interest for both entities. |
y |
A vector of labels. |
kernel |
Character. Either "linear" or "polynomial" (default = "polynomial"). |
kernel.degree |
Parameter needed for |
cost |
Specifies the SVM margin 'hardness'. Default value is 50, but can be increased for improved accuracy (although this increases runtimes and memory usage). |
stoppage.threshold |
Numeric. If the number of points misclassified using a linear hyperplane exceeds this proportion of the number of observations, non-linear separation is not attempted. Must be between 0 and 1 (default = 0.2). |
verbose |
Logical. If TRUE, prints diagnostic messages. |
set |
Logical. Is this function being called as part of |
Input data should be preprocessed so that all variables are comparable (e.g. same order of magnitude). Polynomial kernels allow curvilinear decision boundaries to be found between entities (see https://www.cs.cmu.edu/~ggordon/SVMs/new-svms-and-kernels.pdf). Smaller values of kernel.degree
permit less complex decision boundaries; biological significance is likely to be lost at values > 5.
A hyperoverlap-class
object
data = iris[which(iris$Species!=("versicolor")),] x = hyperoverlap_detect(data[,1:3],data$Species, kernel="linear")
data = iris[which(iris$Species!=("versicolor")),] x = hyperoverlap_detect(data[,1:3],data$Species, kernel="linear")
Hyperoverlap visualisation using linear discriminant analysis (LDA)
hyperoverlap_lda(x, return.plot=TRUE, visualise3d=FALSE, showlegend=TRUE)
hyperoverlap_lda(x, return.plot=TRUE, visualise3d=FALSE, showlegend=TRUE)
x |
An |
return.plot |
Logical. If TRUE, data are plotted using |
visualise3d |
Logical. If FALSE, data are projected onto two axes (LDA1, residualPCA1). If TRUE, data are projected onto three axes (LDA1, residualPCA1, residualPCA2) |
showlegend |
Logical. Used for 3D plots. |
This function provides a way to visualise overlap (or non-overlap) between classes of high dimensional data. For inspection, it is useful to use the base graphics package (implemented by return.plot=TRUE). The transformed coordinates of each point are also returned as a dataframe, which can be plotted with user-defined parameters.
Returns a dataframe with columns "Entity", "LDA1", "residualPCA1",
"residualPCA2" (if visualise3d = TRUE
)
#using iris dataset reduced to two species data = iris[which(iris$Species!=("versicolor")),] x = hyperoverlap_detect(data[1:4], data$Species) hyperoverlap_lda(x)
#using iris dataset reduced to two species data = iris[which(iris$Species!=("versicolor")),] x = hyperoverlap_detect(data[1:4], data$Species) hyperoverlap_lda(x)
This function plots a matrix of overlap.
hyperoverlap_pairs_plot(x, cols = pal)
hyperoverlap_pairs_plot(x, cols = pal)
x |
A matrix of the form produced by produced by |
cols |
A vector of colours (default: |
Input matrix must contain columns named "entity1", "entity2" and "result"
A ggplot
object
hyperoverlap.iris.set = hyperoverlap_set(iris[1:3],iris$Species, kernel="linear") hyperoverlap_pairs_plot(hyperoverlap.iris.set)
hyperoverlap.iris.set = hyperoverlap_set(iris[1:3],iris$Species, kernel="linear") hyperoverlap_pairs_plot(hyperoverlap.iris.set)
Plot the optimal separating hyperplane found by hyperoverlap_detect() in 3D .
hyperoverlap_plot(x)
hyperoverlap_plot(x)
x |
An |
hyperoverlap_detect
, hyperoverlap_lda
data = iris[which(iris$Species!=("versicolor")),] x = hyperoverlap_detect(data[,1:3],data$Species, kernel="linear") hyperoverlap_plot(x)
data = iris[which(iris$Species!=("versicolor")),] x = hyperoverlap_detect(data[,1:3],data$Species, kernel="linear") hyperoverlap_plot(x)
This function is a wrapper for hyperoverlap_detect
for pairwise overlap detection between multiple entities.
hyperoverlap_set(x, y, kernel = "polynomial",kernel.degree = 3, cost = 1000, stoppage.threshold = 0.2, write.to.file = FALSE, path = NULL, sample.dimensionality.omit = "FALSE")
hyperoverlap_set(x, y, kernel = "polynomial",kernel.degree = 3, cost = 1000, stoppage.threshold = 0.2, write.to.file = FALSE, path = NULL, sample.dimensionality.omit = "FALSE")
x |
A matrix or data.frame containing the variables of interest for both entities. |
y |
A vector of labels. |
kernel |
Character. Either "linear" or "polynomial" (default = "polynomial"). |
kernel.degree |
Parameter needed for |
cost |
Specifies the SVM margin 'hardness'. Default value is 1000, but can be increased for improved accuracy (although this increases runtimes and memory usage). |
stoppage.threshold |
Numeric. If the number of points misclassified using a linear hyperplane exceeds this proportion of the number of observations, non-linear separation is not attempted. Must be between 0 and 1 (default = 0.2). |
write.to.file |
Logical. If TRUE, each |
path |
Character. Path to write .rds files to. Ignored if |
sample.dimensionality.omit |
Logical. If TRUE, omits any entity pairs with a combined sample size less than n+1, where n is the number of dimensions (see details). |
In n dimensions, any set of points up to n+1 points can be separated using a linear hyperplane. This may produce an artefactual non-overlap result.
The sample.dimensionality.omit
parameter gives two options for dealing with these pairs when they form part of a larger analysis.
If sample.dimensionality.omit = "TRUE"
, this pair is removed from the analysis (result = NA).
If sample.dimensionality.omit = "FALSE"
, the pair is included, but a warning is printed.
A long-form matrix with the following columns:
entity1,
entity2,
shape,
polynomial.order (if kernel="polynomial"
),
result,
number.of.points.misclassified.
If specified, individual Hyperoverlap-class
objects are written to file.
data(iris) hyperoverlap.iris.set = hyperoverlap_set(iris[1:3],iris$Species, kernel="linear")
data(iris) hyperoverlap.iris.set = hyperoverlap_set(iris[1:3],iris$Species, kernel="linear")
Storage class for the description of hyperoverlaps
entity1
A length-one character vector
entity2
A length-one character vector
dimensions
A length n character vector containing the variables used to define the space
occurrences
A matrix containing the labelled input data
shape
shape of the decision boundary; either "linear" or "curvilinear"
polynomial.order
a length-one numeric vector showing the polynomial order of the most accurate kernel function. "0" if linear kernel.
result
a length-one character vector, either "overlap" or "non-overlap"
accuracy
a 2x2 table with the true (y) and predicted (pred) labels
number.of.points.misclassified
a length-one numeric vector
model
svm model used to plot decision boundary