Hyperoverlap can be used to detect and visualise overlap in n-dimensional space.
To explore the functions in hyperoverlap, we’ll use the
iris
dataset. This dataset contains 150 observations of
three species of iris (“setosa”, “versicolor” and “virginica”). These
data are four-dimensional (Sepal.Length, Sepal.Width, Petal.Length,
Petal.Width) and are documented in ?iris
. We’ll set up five
test datasets to explore the different functions: 1. test1
two entities (setosa, virginica); three dimensions (Sepal.Length,
Sepal.Width, Petal.Length) 1. test2
two entities
(versicolor, virginica); three dimensions (as above) 1.
test3
two entities (setosa, virginica); four dimensions 1.
test4
two entities (versicolor, virginica); four dimensions
1. test5
all entities, all dimensions
test1 <- iris[which(iris$Species!="versicolor"),c(1:3,5)]
test2 <- iris[which(iris$Species!="setosa"),c(1:3,5)]
test3 <- iris[which(iris$Species!="versicolor"),]
test4 <- iris[which(iris$Species!="setosa"),]
test5 <- iris
Note that entities may be species, genera, populations etc.
To plot the decision boundary using hyperoverlap_plot
,
the data cannot exceed three dimensions. For high-dimensional
visualisation, see hyperoverlap_lda
.
library(hyperoverlap)
setosa_virginica3d <- hyperoverlap_detect(test1[,1:3], test1$Species)
versicolor_virginica3d <- hyperoverlap_detect(test2[,1:3], test2$Species)
To examine the result:
setosa_virginica3d@result #gives us the result: overlap or non-overlap?
#> [1] "non-overlap"
versicolor_virginica3d@result
#> [1] "overlap"
setosa_virginica3d@shape #for the non-overlapping pair, was the decision boundary linear or curvilinear?
#> [1] "linear"
hyperoverlap_plot(setosa_virginica3d) #plot the data and the decision boundary in 3d
Note the points on the ‘wrong side’ of the boundary when comparing versicolor and virginica
To visualise overlap in n-dimensions, we need to use ordination
techniques. The function hyperoverlap_lda
uses a
combination of linear discriminant analysis (LDA) and principal
components analysis (PCA) to choose the best two (or three) axes for
visualisation. To plot these using other methods
(e.g. ggplot2
), the point coordinates are returned as
output, here named transformed_data
.
setosa_virginica4d <- hyperoverlap_detect(test3[,1:4], test3$Species)
versicolor_virginica4d <- hyperoverlap_detect(test4[,1:4], test4$Species)
To examine the result:
setosa_virginica4d@result #gives us the result: overlap or non-overlap?
#> [1] "non-overlap"
versicolor_virginica4d@result
#> [1] "overlap"
setosa_virginica4d@shape #for the non-overlapping pair, was the decision boundary linear or curvilinear?
#> [1] "linear"
transformed_data <- hyperoverlap_lda(setosa_virginica4d) #plots the best two dimensions for visualising overlap
transformed_data <- hyperoverlap_lda(versicolor_virginica4d)
In three dimensions:
We might want to know which species overlap in certain variables from
an entire genus. To do this, we can use hyperoverlap_set
and visualise the results using hyperoverlap_pairs_plot
all_spp <- hyperoverlap_set(test5[,1:4],test5$Species)
all_spp_plot <- hyperoverlap_pairs_plot(all_spp)
all_spp_plot
#> Warning: Use of `x$result` is discouraged.
#> ℹ Use `result` instead.