iv_locate_between()
locates where needles
, a vector, falls between the
bounds of haystack
, an iv. It works similar to base::match()
, where
needles[i]
checks for a match in all of haystack
. Unlike match()
, all
matches are returned, rather than just the first.
This function returns a two column data frame. The needles
column is an
integer vector pointing to locations in needles
. The haystack
column is
an integer vector pointing to locations in haystack
with a match.
Usage
iv_locate_between(
needles,
haystack,
...,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all"
)
Arguments
- needles, haystack
[vector, iv]
needles
should be a vector andhaystack
should be an iv.needles
should have the same type as the start/end components ofhaystack
.Each element of
needles
represents the value to search for.haystack
represents the intervals to search in.
- ...
These dots are for future extensions and must be empty.
- missing
[integer(1) / "equals" / "drop" / "error"]
Handling of missing values in
needles
."equals"
considers missing values inneedles
as exactly equal to missing intervals inhaystack
when determining if there is a matching relationship between them."drop"
drops missing values inneedles
from the result."error"
throws an error if any values inneedles
are missing.If a single integer is provided, this represents the value returned in the
haystack
column for values inneedles
that are missing.
- no_match
Handling of
needles
without a match."drop"
dropsneedles
with zero matches from the result."error"
throws an error if anyneedles
have zero matches.If a single integer is provided, this represents the value returned in the
haystack
column for observations ofneedles
that have zero matches. The default represents an unmatched needle withNA
.
- remaining
Handling of
haystack
values thatneedles
never matched."drop"
drops remaininghaystack
values from the result. Typically, this is the desired behavior if you only care whenneedles
has a match."error"
throws an error if there are any remaininghaystack
values.If a single integer is provided (often
NA
), this represents the value returned in theneedles
column for the remaininghaystack
values thatneedles
never matched. Remaininghaystack
values are always returned at the end of the result.
- multiple
Handling of
needles
with multiple matches. For each needle:"all"
returns all matches detected inhaystack
."any"
returns any match detected inhaystack
with no guarantees on which match will be returned. It is often faster than"first"
and"last"
if you just need to detect if there is at least one match."first"
returns the first match detected inhaystack
."last"
returns the last match detected inhaystack
."warning"
throws a warning if multiple matches are detected, but otherwise falls back to"all"
."error"
throws an error if multiple matches are detected.
Examples
x <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-07", "2019-01-20"))
y <- iv_pairs(
as.Date(c("2019-01-01", "2019-01-03")),
as.Date(c("2019-01-04", "2019-01-08")),
as.Date(c("2019-01-07", "2019-01-09")),
as.Date(c("2019-01-10", "2019-01-20")),
as.Date(c("2019-01-15", "2019-01-20"))
)
x
#> [1] "2019-01-05" "2019-01-10" "2019-01-07" "2019-01-20"
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)
# Find any location where `x` is between the intervals in `y`
loc <- iv_locate_between(x, y)
loc
#> needles haystack
#> 1 1 2
#> 2 2 4
#> 3 3 2
#> 4 3 3
#> 5 4 NA
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
#> 5 2019-01-20 [NA, NA)
# Drop values in `x` without a match
loc <- iv_locate_between(x, y, no_match = "drop")
loc
#> needles haystack
#> 1 1 2
#> 2 2 4
#> 3 3 2
#> 4 3 3
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
# ---------------------------------------------------------------------------
a <- c(1, NA)
b <- iv(c(NA, NA), c(NA, NA))
# By default, missing values in `needles` are treated as being exactly
# equal to missing intervals in `haystack`, so the missing value in `a` is
# considered between the missing interval in `b`.
iv_locate_between(a, b)
#> needles haystack
#> 1 1 NA
#> 2 2 1
#> 3 2 2
# If you'd like missing values in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_between(a, b, missing = NA)
#> needles haystack
#> 1 1 NA
#> 2 2 NA