This family of functions locates different types of relationships between a
vector and an iv. It works similar to base::match()
, where needles[i]
checks for a match in all of haystack
. Unlike match()
, all matches are
returned, rather than just the first.
iv_locate_between()
locates whereneedles
, a vector, falls between the bounds ofhaystack
, an iv.iv_locate_includes()
locates whereneedles
, an iv, includes the values ofhaystack
, a vector.
These functions return a two column data frame. The needles
column is an
integer vector pointing to locations in needles
. The haystack
column is
an integer vector pointing to locations in haystack
with a match.
Usage
iv_locate_between(
needles,
haystack,
...,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)
iv_locate_includes(
needles,
haystack,
...,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)
Arguments
- needles, haystack
[vector, iv]
For
iv_*_between()
,needles
should be a vector andhaystack
should be an iv.For
iv_*_includes()
,needles
should be an iv andhaystack
should be a vector.Each element of
needles
represents the value / interval to match.haystack
represents the values / intervals to match against.
- ...
These dots are for future extensions and must be empty.
- missing
[integer(1) / "equals" / "drop" / "error"]
Handling of missing values in
needles
."equals"
considers missing values inneedles
as exactly equal to missing values inhaystack
when determining if there is a matching relationship between them."drop"
drops missing values inneedles
from the result."error"
throws an error if any values inneedles
are missing.If a single integer is provided, this represents the value returned in the
haystack
column for values inneedles
that are missing.
- no_match
Handling of
needles
without a match."drop"
dropsneedles
with zero matches from the result."error"
throws an error if anyneedles
have zero matches.If a single integer is provided, this represents the value returned in the
haystack
column for values ofneedles
that have zero matches. The default represents an unmatched needle withNA
.
- remaining
Handling of
haystack
values thatneedles
never matched."drop"
drops remaininghaystack
values from the result. Typically, this is the desired behavior if you only care whenneedles
has a match."error"
throws an error if there are any remaininghaystack
values.If a single integer is provided (often
NA
), this represents the value returned in theneedles
column for the remaininghaystack
values thatneedles
never matched. Remaininghaystack
values are always returned at the end of the result.
- multiple
Handling of
needles
with multiple matches. For each needle:"all"
returns all matches detected inhaystack
."any"
returns any match detected inhaystack
with no guarantees on which match will be returned. It is often faster than"first"
and"last"
if you just need to detect if there is at least one match."first"
returns the first match detected inhaystack
."last"
returns the last match detected inhaystack
.
- relationship
Handling of the expected relationship between
needles
andhaystack
. If the expectations chosen from the list below are invalidated, an error is thrown."none"
doesn't perform any relationship checks."one-to-one"
expects:Each value in
needles
matches at most 1 value inhaystack
.Each value in
haystack
matches at most 1 value inneedles
.
"one-to-many"
expects:Each value in
needles
matches any number of values inhaystack
.Each value in
haystack
matches at most 1 value inneedles
.
"many-to-one"
expects:Each value in
needles
matches at most 1 value inhaystack
.Each value in
haystack
matches any number of values inneedles
.
"many-to-many"
expects:Each value in
needles
matches any number of values inhaystack
.Each value in
haystack
matches any number of values inneedles
.
This performs no checks, and is identical to
"none"
, but is provided to allow you to be explicit about this relationship if you know it exists."warn-many-to-many"
doesn't assume there is any known relationship, but will warn ifneedles
andhaystack
have a many-to-many relationship (which is typically unexpected), encouraging you to either take a closer look at your inputs or make this relationship explicit by specifying"many-to-many"
.
relationship
is applied afterfilter
andmultiple
to allow potential multiple matches to be filtered out first.relationship
doesn't handle cases where there are zero matches. For that, seeno_match
andremaining
.
Examples
x <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-07", "2019-01-20"))
y <- iv_pairs(
as.Date(c("2019-01-01", "2019-01-03")),
as.Date(c("2019-01-04", "2019-01-08")),
as.Date(c("2019-01-07", "2019-01-09")),
as.Date(c("2019-01-10", "2019-01-20")),
as.Date(c("2019-01-15", "2019-01-20"))
)
x
#> [1] "2019-01-05" "2019-01-10" "2019-01-07" "2019-01-20"
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)
# Find any location where `x` is between the intervals in `y`
loc <- iv_locate_between(x, y)
loc
#> needles haystack
#> 1 1 2
#> 2 2 4
#> 3 3 2
#> 4 3 3
#> 5 4 NA
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
#> 5 2019-01-20 [NA, NA)
# Find any location where `y` includes the values in `x`
loc <- iv_locate_includes(y, x)
loc
#> needles haystack
#> 1 1 NA
#> 2 2 1
#> 3 2 3
#> 4 3 3
#> 5 4 2
#> 6 5 NA
iv_align(y, x, locations = loc)
#> needles haystack
#> 1 [2019-01-01, 2019-01-03) <NA>
#> 2 [2019-01-04, 2019-01-08) 2019-01-05
#> 3 [2019-01-04, 2019-01-08) 2019-01-07
#> 4 [2019-01-07, 2019-01-09) 2019-01-07
#> 5 [2019-01-10, 2019-01-20) 2019-01-10
#> 6 [2019-01-15, 2019-01-20) <NA>
# Drop values in `x` without a match
loc <- iv_locate_between(x, y, no_match = "drop")
loc
#> needles haystack
#> 1 1 2
#> 2 2 4
#> 3 3 2
#> 4 3 3
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
# ---------------------------------------------------------------------------
a <- c(1, NA)
b <- iv(c(NA, NA), c(NA, NA))
# By default, missing values in `needles` are treated as being exactly
# equal to missing values in `haystack`, so the missing value in `a` is
# considered between the missing interval in `b`.
iv_locate_between(a, b)
#> needles haystack
#> 1 1 NA
#> 2 2 1
#> 3 2 2
iv_locate_includes(b, a)
#> needles haystack
#> 1 1 2
#> 2 2 2
# If you'd like missing values in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_between(a, b, missing = NA)
#> needles haystack
#> 1 1 NA
#> 2 2 NA
iv_locate_includes(b, a, missing = NA)
#> needles haystack
#> 1 1 NA
#> 2 2 NA