Containers — iv-containers • ivs

This family of functions revolves around computing interval containers. A container is defined as the widest interval that isn't contained by any other interval.

iv_containers() returns all of the containers found within x.
iv_identify_containers() identifies the containers that each interval in x falls in. It replaces x with a list of the same size where each element of the list contains the containers that the corresponding interval in x falls in. This is particularly useful alongside tidyr::unnest().
iv_identify_container() is similar in spirit to iv_identify_containers(), but is useful when you suspect that each interval in x is contained within exactly 1 container. It replaces x with an iv of the same size where each interval is the container that the corresponding interval in x falls in. If any interval falls in more than one container, an error is thrown.
iv_locate_containers() returns a two column data frame with a key column containing the result of iv_containers() and a loc list-column containing integer vectors that map each interval in x to the container that it falls in.

Usage

iv_containers(x)

iv_identify_containers(x)

iv_identify_container(x)

iv_locate_containers(x)

Arguments

x

[iv]

An interval vector.

Value

For iv_containers(), an iv with the same type as x.
For iv_identify_containers(), a list-of containing ivs with the same size as x.
For iv_identify_container(), an iv with the same type as x.
For iv_locate_containers(), a two column data frame with a key column containing the result of iv_containers() and a loc list-column containing integer vectors.

Examples

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

x <- iv_pairs(
  c(4, 6),
  c(1, 5),
  c(2, 3),
  c(NA, NA),
  c(NA, NA),
  c(9, 12),
  c(9, 14)
)
x
#> <iv<double>[7]>
#> [1] [4, 6)   [1, 5)   [2, 3)   [NA, NA) [NA, NA) [9, 12)  [9, 14) 

# Containers are intervals which aren't contained in any other interval.
# They are always returned in ascending order.
# If any missing intervals are present, a single one is retained.
iv_containers(x)
#> <iv<double>[4]>
#> [1] [1, 5)   [4, 6)   [9, 14)  [NA, NA)

# `iv_identify_container()` is useful alongside `group_by()` and
# `summarize()` if you know that each interval is contained within exactly
# 1 container
df <- tibble(x = x)
df <- mutate(df, container = iv_identify_container(x))
df
#> # A tibble: 7 × 2
#>           x container
#>   <iv<dbl>> <iv<dbl>>
#> 1    [4, 6)    [4, 6)
#> 2    [1, 5)    [1, 5)
#> 3    [2, 3)    [1, 5)
#> 4  [NA, NA)  [NA, NA)
#> 5  [NA, NA)  [NA, NA)
#> 6   [9, 12)   [9, 14)
#> 7   [9, 14)   [9, 14)

df %>%
  group_by(container) %>%
  summarize(n = n())
#> # A tibble: 4 × 2
#>   container     n
#>   <iv<dbl>> <int>
#> 1    [1, 5)     2
#> 2    [4, 6)     1
#> 3   [9, 14)     2
#> 4  [NA, NA)     2

# If any interval is contained within multiple containers,
# then you can't use `iv_identify_container()`
y <- c(x, iv_pairs(c(0, 3), c(8, 13)))
y
#> <iv<double>[9]>
#> [1] [4, 6)   [1, 5)   [2, 3)   [NA, NA) [NA, NA) [9, 12)  [9, 14)  [0, 3)  
#> [9] [8, 13) 

try(iv_identify_container(y))
#> Error in iv_identify_container(y) : 
#>   Intervals in `x` can't fall within multiple containers.
#> ℹ Location 3 falls within multiple containers.
#> ℹ Use `iv_identify_containers()` to identify all of the containers that a particular interval is contained by.

# Instead, use `iv_identify_containers()` to identify every container
# that each interval falls in
df <- tibble(y = y, container = iv_identify_containers(y))
df
#> # A tibble: 9 × 2
#>           y       container
#>   <iv<dbl>> <list<iv<dbl>>>
#> 1    [4, 6)             [1]
#> 2    [1, 5)             [1]
#> 3    [2, 3)             [2]
#> 4  [NA, NA)             [1]
#> 5  [NA, NA)             [1]
#> 6   [9, 12)             [2]
#> 7   [9, 14)             [1]
#> 8    [0, 3)             [1]
#> 9   [8, 13)             [1]

# You can use `tidyr::unchop()` to see the containers that each interval
# falls in
df %>%
  mutate(row = row_number(), .before = 1) %>%
  unchop(container)
#> # A tibble: 11 × 3
#>      row         y container
#>    <int> <iv<dbl>> <iv<dbl>>
#>  1     1    [4, 6)    [4, 6)
#>  2     2    [1, 5)    [1, 5)
#>  3     3    [2, 3)    [0, 3)
#>  4     3    [2, 3)    [1, 5)
#>  5     4  [NA, NA)  [NA, NA)
#>  6     5  [NA, NA)  [NA, NA)
#>  7     6   [9, 12)   [8, 13)
#>  8     6   [9, 12)   [9, 14)
#>  9     7   [9, 14)   [9, 14)
#> 10     8    [0, 3)    [0, 3)
#> 11     9   [8, 13)   [8, 13)

# A more programmatic interface to `iv_identify_containers()` is
# `iv_locate_containers()`, which returns the containers you get from
# `iv_containers()` alongside the locations in the input that they contain.
iv_locate_containers(y)
#>        key  loc
#> 1   [0, 3) 3, 8
#> 2   [1, 5) 2, 3
#> 3   [4, 6)    1
#> 4  [8, 13) 6, 9
#> 5  [9, 14) 6, 7
#> 6 [NA, NA) 4, 5