Skip to contents

This family of functions locates different types of relationships between two ivs. It works similar to base::match(), where needles[i] checks for a relationship in all of haystack. Unlike match(), all matching relationships are returned, rather than just the first.

  • iv_locate_overlaps() locates a specific type of overlap between the two ivs.

  • iv_locate_precedes() locates where needles[i] precedes (i.e. comes before) any interval in haystack.

  • iv_locate_follows() locates where needles[i] follows (i.e. comes after) any interval in haystack.

These functions return a two column data frame. The needles column is an integer vector pointing to locations in needles. The haystack column is an integer vector pointing to locations in haystack with a matching relationship.

Usage

iv_locate_overlaps(
  needles,
  haystack,
  ...,
  type = "any",
  missing = "equals",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all"
)

iv_locate_precedes(
  needles,
  haystack,
  ...,
  closest = FALSE,
  missing = "equals",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all"
)

iv_locate_follows(
  needles,
  haystack,
  ...,
  closest = FALSE,
  missing = "equals",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all"
)

Arguments

needles, haystack

[iv]

Interval vectors used for relation matching.

  • Each element of needles represents the interval to search for.

  • haystack represents the intervals to search in.

Prior to comparison, needles and haystack are coerced to the same type.

...

These dots are for future extensions and must be empty.

type

[character(1)]

The type of relationship to find. One of:

  • "any": Finds any overlap whatsoever between an interval in needles and an interval in haystack.

  • "within": Finds when an interval in needles is completely within (or equal to) an interval in haystack.

  • "contains": Finds when an interval in needles completely contains (or equals) an interval in haystack.

  • "equals": Finds when an interval in needles is exactly equal to an interval in haystack.

  • "starts": Finds when the start of an interval in needles matches the start of an interval in haystack.

  • "ends": Finds when the end of an interval in needles matches the end of an interval in haystack.

missing

[integer(1) / "equals" / "drop" / "error"]

Handling of missing intervals in needles.

  • "equals" considers missing intervals in needles as exactly equal to missing intervals in haystack when determining if there is a matching relationship between them.

  • "drop" drops missing intervals in needles from the result.

  • "error" throws an error if any intervals in needles are missing.

  • If a single integer is provided, this represents the value returned in the haystack column for intervals in needles that are missing.

no_match

Handling of needles without a match.

  • "drop" drops needles with zero matches from the result.

  • "error" throws an error if any needles have zero matches.

  • If a single integer is provided, this represents the value returned in the haystack column for observations of needles that have zero matches. The default represents an unmatched needle with NA.

remaining

Handling of haystack values that needles never matched.

  • "drop" drops remaining haystack values from the result. Typically, this is the desired behavior if you only care when needles has a match.

  • "error" throws an error if there are any remaining haystack values.

  • If a single integer is provided (often NA), this represents the value returned in the needles column for the remaining haystack values that needles never matched. Remaining haystack values are always returned at the end of the result.

multiple

Handling of needles with multiple matches. For each needle:

  • "all" returns all matches detected in haystack.

  • "any" returns any match detected in haystack with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.

  • "first" returns the first match detected in haystack.

  • "last" returns the last match detected in haystack.

  • "warning" throws a warning if multiple matches are detected, but otherwise falls back to "all".

  • "error" throws an error if multiple matches are detected.

closest

[TRUE / FALSE]

Should only the closest relationship be returned?

If TRUE, will only return the closest interval(s) in haystack that the current value of needles either precedes or follows. Note that multiple intervals can still be returned if there are ties, which can be resolved using multiple.

Value

A data frame containing two integer columns named needles and haystack.

Examples

x <- iv_pairs(
  as.Date(c("2019-01-05", "2019-01-10")),
  as.Date(c("2019-01-07", "2019-01-15")),
  as.Date(c("2019-01-20", "2019-01-31"))
)

y <- iv_pairs(
  as.Date(c("2019-01-01", "2019-01-03")),
  as.Date(c("2019-01-04", "2019-01-08")),
  as.Date(c("2019-01-07", "2019-01-09")),
  as.Date(c("2019-01-10", "2019-01-20")),
  as.Date(c("2019-01-15", "2019-01-20"))
)

x
#> <iv<date>[3]>
#> [1] [2019-01-05, 2019-01-10) [2019-01-07, 2019-01-15) [2019-01-20, 2019-01-31)
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)

# Find any overlap between `x` and `y`
loc <- iv_locate_overlaps(x, y)
loc
#>   needles haystack
#> 1       1        2
#> 2       1        3
#> 3       2        2
#> 4       2        3
#> 5       2        4
#> 6       3       NA

iv_align(x, y, locations = loc)
#>                    needles                 haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-04, 2019-01-08)
#> 2 [2019-01-05, 2019-01-10) [2019-01-07, 2019-01-09)
#> 3 [2019-01-07, 2019-01-15) [2019-01-04, 2019-01-08)
#> 4 [2019-01-07, 2019-01-15) [2019-01-07, 2019-01-09)
#> 5 [2019-01-07, 2019-01-15) [2019-01-10, 2019-01-20)
#> 6 [2019-01-20, 2019-01-31)                 [NA, NA)

# Find where `x` contains `y` and drop results when there isn't a match
loc <- iv_locate_overlaps(x, y, type = "contains", no_match = "drop")
loc
#>   needles haystack
#> 1       1        3
#> 2       2        3

iv_align(x, y, locations = loc)
#>                    needles                 haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-07, 2019-01-09)
#> 2 [2019-01-07, 2019-01-15) [2019-01-07, 2019-01-09)

# Find where `x` precedes `y`
loc <- iv_locate_precedes(x, y)
loc
#>   needles haystack
#> 1       1        4
#> 2       1        5
#> 3       2        5
#> 4       3       NA

iv_align(x, y, locations = loc)
#>                    needles                 haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-10, 2019-01-20)
#> 2 [2019-01-05, 2019-01-10) [2019-01-15, 2019-01-20)
#> 3 [2019-01-07, 2019-01-15) [2019-01-15, 2019-01-20)
#> 4 [2019-01-20, 2019-01-31)                 [NA, NA)

# Filter down to find only the closest interval in `y` of all the intervals
# where `x` preceded it
loc <- iv_locate_precedes(x, y, closest = TRUE)

iv_align(x, y, locations = loc)
#>                    needles                 haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-10, 2019-01-20)
#> 2 [2019-01-07, 2019-01-15) [2019-01-15, 2019-01-20)
#> 3 [2019-01-20, 2019-01-31)                 [NA, NA)

# Note that `closest` can result in duplicates if there is a tie.
# `2019-01-20` appears as an end date twice in `haystack`.
loc <- iv_locate_follows(x, y, closest = TRUE)
loc
#>   needles haystack
#> 1       1        1
#> 2       2        1
#> 3       3        4
#> 4       3        5

iv_align(x, y, locations = loc)
#>                    needles                 haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-01, 2019-01-03)
#> 2 [2019-01-07, 2019-01-15) [2019-01-01, 2019-01-03)
#> 3 [2019-01-20, 2019-01-31) [2019-01-10, 2019-01-20)
#> 4 [2019-01-20, 2019-01-31) [2019-01-15, 2019-01-20)

# Force just one of the ties to be returned by using `multiple`.
# Here we just request any of the ties, with no guarantee on which one.
loc <- iv_locate_follows(x, y, closest = TRUE, multiple = "any")
loc
#>   needles haystack
#> 1       1        1
#> 2       2        1
#> 3       3        4

iv_align(x, y, locations = loc)
#>                    needles                 haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-01, 2019-01-03)
#> 2 [2019-01-07, 2019-01-15) [2019-01-01, 2019-01-03)
#> 3 [2019-01-20, 2019-01-31) [2019-01-10, 2019-01-20)

# ---------------------------------------------------------------------------

a <- iv(NA, NA)
b <- iv(c(NA, NA), c(NA, NA))

# By default, missing intervals in `needles` are seen as exactly equal to
# missing intervals in `haystack`, which means that they overlap
iv_locate_overlaps(a, b)
#>   needles haystack
#> 1       1        1
#> 2       1        2

# If you'd like missing intervals in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_overlaps(a, b, missing = NA)
#>   needles haystack
#> 1       1       NA