This family of functions locates different types of relationships between a
vector and an iv. It works similar to base::match(), where needles[i]
checks for a match in all of haystack. Unlike match(), all matches are
returned, rather than just the first.
iv_locate_between()locates whereneedles, a vector, falls between the bounds ofhaystack, an iv.iv_locate_includes()locates whereneedles, an iv, includes the values ofhaystack, a vector.
These functions return a two column data frame. The needles column is an
integer vector pointing to locations in needles. The haystack column is
an integer vector pointing to locations in haystack with a match.
Usage
iv_locate_between(
needles,
haystack,
...,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)
iv_locate_includes(
needles,
haystack,
...,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)Arguments
- needles, haystack
[vector, iv]For
iv_*_between(),needlesshould be a vector andhaystackshould be an iv.For
iv_*_includes(),needlesshould be an iv andhaystackshould be a vector.Each element of
needlesrepresents the value / interval to match.haystackrepresents the values / intervals to match against.
- ...
These dots are for future extensions and must be empty.
- missing
[integer(1) / "equals" / "drop" / "error"]Handling of missing values in
needles."equals"considers missing values inneedlesas exactly equal to missing values inhaystackwhen determining if there is a matching relationship between them."drop"drops missing values inneedlesfrom the result."error"throws an error if any values inneedlesare missing.If a single integer is provided, this represents the value returned in the
haystackcolumn for values inneedlesthat are missing.
- no_match
Handling of
needleswithout a match."drop"dropsneedleswith zero matches from the result."error"throws an error if anyneedleshave zero matches.If a single integer is provided, this represents the value returned in the
haystackcolumn for values ofneedlesthat have zero matches. The default represents an unmatched needle withNA.
- remaining
Handling of
haystackvalues thatneedlesnever matched."drop"drops remaininghaystackvalues from the result. Typically, this is the desired behavior if you only care whenneedleshas a match."error"throws an error if there are any remaininghaystackvalues.If a single integer is provided (often
NA), this represents the value returned in theneedlescolumn for the remaininghaystackvalues thatneedlesnever matched. Remaininghaystackvalues are always returned at the end of the result.
- multiple
Handling of
needleswith multiple matches. For each needle:"all"returns all matches detected inhaystack."any"returns any match detected inhaystackwith no guarantees on which match will be returned. It is often faster than"first"and"last"if you just need to detect if there is at least one match."first"returns the first match detected inhaystack."last"returns the last match detected inhaystack.
- relationship
Handling of the expected relationship between
needlesandhaystack. If the expectations chosen from the list below are invalidated, an error is thrown."none"doesn't perform any relationship checks."one-to-one"expects:Each value in
needlesmatches at most 1 value inhaystack.Each value in
haystackmatches at most 1 value inneedles.
"one-to-many"expects:Each value in
needlesmatches any number of values inhaystack.Each value in
haystackmatches at most 1 value inneedles.
"many-to-one"expects:Each value in
needlesmatches at most 1 value inhaystack.Each value in
haystackmatches any number of values inneedles.
"many-to-many"expects:Each value in
needlesmatches any number of values inhaystack.Each value in
haystackmatches any number of values inneedles.
This performs no checks, and is identical to
"none", but is provided to allow you to be explicit about this relationship if you know it exists."warn-many-to-many"doesn't assume there is any known relationship, but will warn ifneedlesandhaystackhave a many-to-many relationship (which is typically unexpected), encouraging you to either take a closer look at your inputs or make this relationship explicit by specifying"many-to-many".
relationshipis applied afterfilterandmultipleto allow potential multiple matches to be filtered out first.relationshipdoesn't handle cases where there are zero matches. For that, seeno_matchandremaining.
Examples
x <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-07", "2019-01-20"))
y <- iv_pairs(
as.Date(c("2019-01-01", "2019-01-03")),
as.Date(c("2019-01-04", "2019-01-08")),
as.Date(c("2019-01-07", "2019-01-09")),
as.Date(c("2019-01-10", "2019-01-20")),
as.Date(c("2019-01-15", "2019-01-20"))
)
x
#> [1] "2019-01-05" "2019-01-10" "2019-01-07" "2019-01-20"
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)
# Find any location where `x` is between the intervals in `y`
loc <- iv_locate_between(x, y)
loc
#> needles haystack
#> 1 1 2
#> 2 2 4
#> 3 3 2
#> 4 3 3
#> 5 4 NA
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
#> 5 2019-01-20 [NA, NA)
# Find any location where `y` includes the values in `x`
loc <- iv_locate_includes(y, x)
loc
#> needles haystack
#> 1 1 NA
#> 2 2 1
#> 3 2 3
#> 4 3 3
#> 5 4 2
#> 6 5 NA
iv_align(y, x, locations = loc)
#> needles haystack
#> 1 [2019-01-01, 2019-01-03) <NA>
#> 2 [2019-01-04, 2019-01-08) 2019-01-05
#> 3 [2019-01-04, 2019-01-08) 2019-01-07
#> 4 [2019-01-07, 2019-01-09) 2019-01-07
#> 5 [2019-01-10, 2019-01-20) 2019-01-10
#> 6 [2019-01-15, 2019-01-20) <NA>
# Drop values in `x` without a match
loc <- iv_locate_between(x, y, no_match = "drop")
loc
#> needles haystack
#> 1 1 2
#> 2 2 4
#> 3 3 2
#> 4 3 3
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 2019-01-05 [2019-01-04, 2019-01-08)
#> 2 2019-01-10 [2019-01-10, 2019-01-20)
#> 3 2019-01-07 [2019-01-04, 2019-01-08)
#> 4 2019-01-07 [2019-01-07, 2019-01-09)
# ---------------------------------------------------------------------------
a <- c(1, NA)
b <- iv(c(NA, NA), c(NA, NA))
# By default, missing values in `needles` are treated as being exactly
# equal to missing values in `haystack`, so the missing value in `a` is
# considered between the missing interval in `b`.
iv_locate_between(a, b)
#> needles haystack
#> 1 1 NA
#> 2 2 1
#> 3 2 2
iv_locate_includes(b, a)
#> needles haystack
#> 1 1 2
#> 2 2 2
# If you'd like missing values in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_between(a, b, missing = NA)
#> needles haystack
#> 1 1 NA
#> 2 2 NA
iv_locate_includes(b, a, missing = NA)
#> needles haystack
#> 1 1 NA
#> 2 2 NA