Skip to contents

iv_locate_between() locates where needles, a vector, falls between the bounds of haystack, an iv. It works similar to base::match(), where needles[i] checks for a match in all of haystack. Unlike match(), all matches are returned, rather than just the first.

This function returns a two column data frame. The needles column is an integer vector pointing to locations in needles. The haystack column is an integer vector pointing to locations in haystack with a match.

Usage

iv_locate_between(
  needles,
  haystack,
  ...,
  missing = "equals",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all"
)

Arguments

needles, haystack

[vector, iv]

needles should be a vector and haystack should be an iv. needles should have the same type as the start/end components of haystack.

  • Each element of needles represents the value to search for.

  • haystack represents the intervals to search in.

...

These dots are for future extensions and must be empty.

missing

[integer(1) / "equals" / "drop" / "error"]

Handling of missing values in needles.

  • "equals" considers missing values in needles as exactly equal to missing intervals in haystack when determining if there is a matching relationship between them.

  • "drop" drops missing values in needles from the result.

  • "error" throws an error if any values in needles are missing.

  • If a single integer is provided, this represents the value returned in the haystack column for values in needles that are missing.

no_match

Handling of needles without a match.

  • "drop" drops needles with zero matches from the result.

  • "error" throws an error if any needles have zero matches.

  • If a single integer is provided, this represents the value returned in the haystack column for observations of needles that have zero matches. The default represents an unmatched needle with NA.

remaining

Handling of haystack values that needles never matched.

  • "drop" drops remaining haystack values from the result. Typically, this is the desired behavior if you only care when needles has a match.

  • "error" throws an error if there are any remaining haystack values.

  • If a single integer is provided (often NA), this represents the value returned in the needles column for the remaining haystack values that needles never matched. Remaining haystack values are always returned at the end of the result.

multiple

Handling of needles with multiple matches. For each needle:

  • "all" returns all matches detected in haystack.

  • "any" returns any match detected in haystack with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.

  • "first" returns the first match detected in haystack.

  • "last" returns the last match detected in haystack.

  • "warning" throws a warning if multiple matches are detected, but otherwise falls back to "all".

  • "error" throws an error if multiple matches are detected.

Value

A data frame containing two integer columns named needles and haystack.

Examples

x <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-07", "2019-01-20"))

y <- iv_pairs(
  as.Date(c("2019-01-01", "2019-01-03")),
  as.Date(c("2019-01-04", "2019-01-08")),
  as.Date(c("2019-01-07", "2019-01-09")),
  as.Date(c("2019-01-10", "2019-01-20")),
  as.Date(c("2019-01-15", "2019-01-20"))
)

x
#> [1] "2019-01-05" "2019-01-10" "2019-01-07" "2019-01-20"
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)

# Find any location where `x` is between the intervals in `y`
loc <- iv_locate_between(x, y)
loc
#>   needles haystack
#> 1       1        2
#> 2       2        4
#> 3       3        2
#> 4       3        3
#> 5       4       NA

iv_align(x, y, locations = loc)
#>      needles                 haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
#> 5 2019-01-20                 [NA, NA)

# Drop values in `x` without a match
loc <- iv_locate_between(x, y, no_match = "drop")
loc
#>   needles haystack
#> 1       1        2
#> 2       2        4
#> 3       3        2
#> 4       3        3

iv_align(x, y, locations = loc)
#>      needles                 haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)

# ---------------------------------------------------------------------------

a <- c(1, NA)
b <- iv(c(NA, NA), c(NA, NA))

# By default, missing values in `needles` are treated as being exactly
# equal to missing intervals in `haystack`, so the missing value in `a` is
# considered between the missing interval in `b`.
iv_locate_between(a, b)
#>   needles haystack
#> 1       1       NA
#> 2       2        1
#> 3       2        2

# If you'd like missing values in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_between(a, b, missing = NA)
#>   needles haystack
#> 1       1       NA
#> 2       2       NA