--- title: "hyperoverlap" author: "Matilda Brown" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{hyperoverlap} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(knitr) library(rgl) knit_hooks$set(webgl = hook_webgl) if (!requireNamespace("rmarkdown", quietly = TRUE) || !rmarkdown::pandoc_available("1.14")) { warning(call. = FALSE, "These vignettes assume rmarkdown and pandoc version 1.14. These were not found. Older versions will not work.") knitr::knit_exit() } ``` Hyperoverlap can be used to detect and visualise overlap in n-dimensional space. ## Data: iris To explore the functions in hyperoverlap, we'll use the `iris` dataset. This dataset contains 150 observations of three species of iris ("setosa", "versicolor" and "virginica"). These data are four-dimensional (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) and are documented in `?iris`. We'll set up five test datasets to explore the different functions: 1. `test1` two entities (setosa, virginica); three dimensions (Sepal.Length, Sepal.Width, Petal.Length) 1. `test2` two entities (versicolor, virginica); three dimensions (as above) 1. `test3` two entities (setosa, virginica); four dimensions 1. `test4` two entities (versicolor, virginica); four dimensions 1. `test5` all entities, all dimensions ```{r, results='show'} test1 <- iris[which(iris$Species!="versicolor"),c(1:3,5)] test2 <- iris[which(iris$Species!="setosa"),c(1:3,5)] test3 <- iris[which(iris$Species!="versicolor"),] test4 <- iris[which(iris$Species!="setosa"),] test5 <- iris ``` Note that entities may be species, genera, populations etc. ## Examining overlap between two entities in 3D To plot the decision boundary using `hyperoverlap_plot`, the data cannot exceed three dimensions. For high-dimensional visualisation, see `hyperoverlap_lda`. ```{r, results='show'} library(hyperoverlap) setosa_virginica3d <- hyperoverlap_detect(test1[,1:3], test1$Species) versicolor_virginica3d <- hyperoverlap_detect(test2[,1:3], test2$Species) ``` To examine the result: ```{r, results='show', fig.height=5,fig.width=7, webgl=TRUE, fig.align="center"} setosa_virginica3d@result #gives us the result: overlap or non-overlap? versicolor_virginica3d@result setosa_virginica3d@shape #for the non-overlapping pair, was the decision boundary linear or curvilinear? hyperoverlap_plot(setosa_virginica3d) #plot the data and the decision boundary in 3d ``` ```{r, results='show', fig.height=5,fig.width=7, webgl=TRUE, fig.align="center"} hyperoverlap_plot(versicolor_virginica3d) ``` Note the points on the 'wrong side' of the boundary when comparing versicolor and virginica ## Examining overlap between two entities in n-dimensions To visualise overlap in n-dimensions, we need to use ordination techniques. The function `hyperoverlap_lda` uses a combination of linear discriminant analysis (LDA) and principal components analysis (PCA) to choose the best two (or three) axes for visualisation. To plot these using other methods (e.g. `ggplot2`), the point coordinates are returned as output, here named `transformed_data`. ```{r, results='show'} setosa_virginica4d <- hyperoverlap_detect(test3[,1:4], test3$Species) versicolor_virginica4d <- hyperoverlap_detect(test4[,1:4], test4$Species) ``` To examine the result: ```{r, results='show', fig.height=4,fig.width=5, fig.show='hold', fig.align='center'} setosa_virginica4d@result #gives us the result: overlap or non-overlap? versicolor_virginica4d@result setosa_virginica4d@shape #for the non-overlapping pair, was the decision boundary linear or curvilinear? transformed_data <- hyperoverlap_lda(setosa_virginica4d) #plots the best two dimensions for visualising overlap transformed_data <- hyperoverlap_lda(versicolor_virginica4d) ``` In three dimensions: ```{r, results='show', fig.height=5,fig.width=7, webgl = hook_webgl,fig.align="center"} rgl.close() #close previous device transformed_data <- hyperoverlap_lda(setosa_virginica4d, visualise3d=TRUE) ``` ```{r, results='show', fig.height=5,fig.width=7, webgl = hook_webgl,fig.align="center"} rgl.close() #close previous device transformed_data <- hyperoverlap_lda(versicolor_virginica4d, visualise3d=TRUE) #plots the best three dimensions for visualising overlap ``` ## Examining patterns of overlap in groups of entities We might want to know which species overlap in certain variables from an entire genus. To do this, we can use `hyperoverlap_set` and visualise the results using `hyperoverlap_pairs_plot` ```{r, results='show', fig.dim=c(5,3),fig.align="center"} all_spp <- hyperoverlap_set(test5[,1:4],test5$Species) all_spp_plot <- hyperoverlap_pairs_plot(all_spp) all_spp_plot ```