Skip to contents

This family of functions locates different types of relationships between a vector and an iv. It works similar to base::match(), where needles[i] checks for a match in all of haystack. Unlike match(), all matches are returned, rather than just the first.

  • iv_locate_between() locates where needles, a vector, falls between the bounds of haystack, an iv.

  • iv_locate_includes() locates where needles, an iv, includes the values of haystack, a vector.

These functions return a two column data frame. The needles column is an integer vector pointing to locations in needles. The haystack column is an integer vector pointing to locations in haystack with a match.

Usage

iv_locate_between(
  needles,
  haystack,
  ...,
  missing = "equals",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all",
  relationship = "none"
)

iv_locate_includes(
  needles,
  haystack,
  ...,
  missing = "equals",
  no_match = NA_integer_,
  remaining = "drop",
  multiple = "all",
  relationship = "none"
)

Arguments

needles, haystack

[vector, iv]

For iv_*_between(), needles should be a vector and haystack should be an iv.

For iv_*_includes(), needles should be an iv and haystack should be a vector.

  • Each element of needles represents the value / interval to match.

  • haystack represents the values / intervals to match against.

...

These dots are for future extensions and must be empty.

missing

[integer(1) / "equals" / "drop" / "error"]

Handling of missing values in needles.

  • "equals" considers missing values in needles as exactly equal to missing values in haystack when determining if there is a matching relationship between them.

  • "drop" drops missing values in needles from the result.

  • "error" throws an error if any values in needles are missing.

  • If a single integer is provided, this represents the value returned in the haystack column for values in needles that are missing.

no_match

Handling of needles without a match.

  • "drop" drops needles with zero matches from the result.

  • "error" throws an error if any needles have zero matches.

  • If a single integer is provided, this represents the value returned in the haystack column for values of needles that have zero matches. The default represents an unmatched needle with NA.

remaining

Handling of haystack values that needles never matched.

  • "drop" drops remaining haystack values from the result. Typically, this is the desired behavior if you only care when needles has a match.

  • "error" throws an error if there are any remaining haystack values.

  • If a single integer is provided (often NA), this represents the value returned in the needles column for the remaining haystack values that needles never matched. Remaining haystack values are always returned at the end of the result.

multiple

Handling of needles with multiple matches. For each needle:

  • "all" returns all matches detected in haystack.

  • "any" returns any match detected in haystack with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.

  • "first" returns the first match detected in haystack.

  • "last" returns the last match detected in haystack.

relationship

Handling of the expected relationship between needles and haystack. If the expectations chosen from the list below are invalidated, an error is thrown.

  • "none" doesn't perform any relationship checks.

  • "one-to-one" expects:

    • Each value in needles matches at most 1 value in haystack.

    • Each value in haystack matches at most 1 value in needles.

  • "one-to-many" expects:

    • Each value in needles matches any number of values in haystack.

    • Each value in haystack matches at most 1 value in needles.

  • "many-to-one" expects:

    • Each value in needles matches at most 1 value in haystack.

    • Each value in haystack matches any number of values in needles.

  • "many-to-many" expects:

    • Each value in needles matches any number of values in haystack.

    • Each value in haystack matches any number of values in needles.

    This performs no checks, and is identical to "none", but is provided to allow you to be explicit about this relationship if you know it exists.

  • "warn-many-to-many" doesn't assume there is any known relationship, but will warn if needles and haystack have a many-to-many relationship (which is typically unexpected), encouraging you to either take a closer look at your inputs or make this relationship explicit by specifying "many-to-many".

relationship is applied after filter and multiple to allow potential multiple matches to be filtered out first.

relationship doesn't handle cases where there are zero matches. For that, see no_match and remaining.

Value

A data frame containing two integer columns named needles and haystack.

Examples

x <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-07", "2019-01-20"))

y <- iv_pairs(
  as.Date(c("2019-01-01", "2019-01-03")),
  as.Date(c("2019-01-04", "2019-01-08")),
  as.Date(c("2019-01-07", "2019-01-09")),
  as.Date(c("2019-01-10", "2019-01-20")),
  as.Date(c("2019-01-15", "2019-01-20"))
)

x
#> [1] "2019-01-05" "2019-01-10" "2019-01-07" "2019-01-20"
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)

# Find any location where `x` is between the intervals in `y`
loc <- iv_locate_between(x, y)
loc
#>   needles haystack
#> 1       1        2
#> 2       2        4
#> 3       3        2
#> 4       3        3
#> 5       4       NA

iv_align(x, y, locations = loc)
#>      needles                 haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
#> 5 2019-01-20                 [NA, NA)

# Find any location where `y` includes the values in `x`
loc <- iv_locate_includes(y, x)
loc
#>   needles haystack
#> 1       1       NA
#> 2       2        1
#> 3       2        3
#> 4       3        3
#> 5       4        2
#> 6       5       NA

iv_align(y, x, locations = loc)
#>                    needles   haystack
#> 1 [2019-01-01, 2019-01-03)       <NA>
#> 2 [2019-01-04, 2019-01-08) 2019-01-05
#> 3 [2019-01-04, 2019-01-08) 2019-01-07
#> 4 [2019-01-07, 2019-01-09) 2019-01-07
#> 5 [2019-01-10, 2019-01-20) 2019-01-10
#> 6 [2019-01-15, 2019-01-20)       <NA>

# Drop values in `x` without a match
loc <- iv_locate_between(x, y, no_match = "drop")
loc
#>   needles haystack
#> 1       1        2
#> 2       2        4
#> 3       3        2
#> 4       3        3

iv_align(x, y, locations = loc)
#>      needles                 haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)

# ---------------------------------------------------------------------------

a <- c(1, NA)
b <- iv(c(NA, NA), c(NA, NA))

# By default, missing values in `needles` are treated as being exactly
# equal to missing values in `haystack`, so the missing value in `a` is
# considered between the missing interval in `b`.
iv_locate_between(a, b)
#>   needles haystack
#> 1       1       NA
#> 2       2        1
#> 3       2        2
iv_locate_includes(b, a)
#>   needles haystack
#> 1       1        2
#> 2       2        2

# If you'd like missing values in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_between(a, b, missing = NA)
#>   needles haystack
#> 1       1       NA
#> 2       2       NA
iv_locate_includes(b, a, missing = NA)
#>   needles haystack
#> 1       1       NA
#> 2       2       NA