This family of functions locates different types of relationships between
two ivs. It works similar to base::match()
, where needles[i]
checks for
a relationship in all of haystack
. Unlike match()
, all matching
relationships are returned, rather than just the first.
iv_locate_overlaps()
locates a specifictype
of overlap between the two ivs.iv_locate_precedes()
locates whereneedles[i]
precedes (i.e. comes before) any interval inhaystack
.iv_locate_follows()
locates whereneedles[i]
follows (i.e. comes after) any interval inhaystack
.
These functions return a two column data frame. The needles
column is an
integer vector pointing to locations in needles
. The haystack
column is
an integer vector pointing to locations in haystack
with a matching
relationship.
Usage
iv_locate_overlaps(
needles,
haystack,
...,
type = "any",
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)
iv_locate_precedes(
needles,
haystack,
...,
closest = FALSE,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)
iv_locate_follows(
needles,
haystack,
...,
closest = FALSE,
missing = "equals",
no_match = NA_integer_,
remaining = "drop",
multiple = "all",
relationship = "none"
)
Arguments
- needles, haystack
[iv]
Interval vectors used for relation matching.
Each element of
needles
represents the interval to search for.haystack
represents the intervals to search in.
Prior to comparison,
needles
andhaystack
are coerced to the same type.- ...
These dots are for future extensions and must be empty.
- type
[character(1)]
The type of relationship to find. One of:
"any"
: Finds any overlap whatsoever between an interval inneedles
and an interval inhaystack
."within"
: Finds when an interval inneedles
is completely within (or equal to) an interval inhaystack
."contains"
: Finds when an interval inneedles
completely contains (or equals) an interval inhaystack
."equals"
: Finds when an interval inneedles
is exactly equal to an interval inhaystack
."starts"
: Finds when the start of an interval inneedles
matches the start of an interval inhaystack
."ends"
: Finds when the end of an interval inneedles
matches the end of an interval inhaystack
.
- missing
[integer(1) / "equals" / "drop" / "error"]
Handling of missing intervals in
needles
."equals"
considers missing intervals inneedles
as exactly equal to missing intervals inhaystack
when determining if there is a matching relationship between them."drop"
drops missing intervals inneedles
from the result."error"
throws an error if any intervals inneedles
are missing.If a single integer is provided, this represents the value returned in the
haystack
column for intervals inneedles
that are missing.
- no_match
Handling of
needles
without a match."drop"
dropsneedles
with zero matches from the result."error"
throws an error if anyneedles
have zero matches.If a single integer is provided, this represents the value returned in the
haystack
column for values ofneedles
that have zero matches. The default represents an unmatched needle withNA
.
- remaining
Handling of
haystack
values thatneedles
never matched."drop"
drops remaininghaystack
values from the result. Typically, this is the desired behavior if you only care whenneedles
has a match."error"
throws an error if there are any remaininghaystack
values.If a single integer is provided (often
NA
), this represents the value returned in theneedles
column for the remaininghaystack
values thatneedles
never matched. Remaininghaystack
values are always returned at the end of the result.
- multiple
Handling of
needles
with multiple matches. For each needle:"all"
returns all matches detected inhaystack
."any"
returns any match detected inhaystack
with no guarantees on which match will be returned. It is often faster than"first"
and"last"
if you just need to detect if there is at least one match."first"
returns the first match detected inhaystack
."last"
returns the last match detected inhaystack
.
- relationship
Handling of the expected relationship between
needles
andhaystack
. If the expectations chosen from the list below are invalidated, an error is thrown."none"
doesn't perform any relationship checks."one-to-one"
expects:Each value in
needles
matches at most 1 value inhaystack
.Each value in
haystack
matches at most 1 value inneedles
.
"one-to-many"
expects:Each value in
needles
matches any number of values inhaystack
.Each value in
haystack
matches at most 1 value inneedles
.
"many-to-one"
expects:Each value in
needles
matches at most 1 value inhaystack
.Each value in
haystack
matches any number of values inneedles
.
"many-to-many"
expects:Each value in
needles
matches any number of values inhaystack
.Each value in
haystack
matches any number of values inneedles
.
This performs no checks, and is identical to
"none"
, but is provided to allow you to be explicit about this relationship if you know it exists."warn-many-to-many"
doesn't assume there is any known relationship, but will warn ifneedles
andhaystack
have a many-to-many relationship (which is typically unexpected), encouraging you to either take a closer look at your inputs or make this relationship explicit by specifying"many-to-many"
.
relationship
is applied afterfilter
andmultiple
to allow potential multiple matches to be filtered out first.relationship
doesn't handle cases where there are zero matches. For that, seeno_match
andremaining
.- closest
[TRUE / FALSE]
Should only the closest relationship be returned?
If
TRUE
, will only return the closest interval(s) inhaystack
that the current value ofneedles
either precedes or follows. Note that multiple intervals can still be returned if there are ties, which can be resolved usingmultiple
.
Examples
x <- iv_pairs(
as.Date(c("2019-01-05", "2019-01-10")),
as.Date(c("2019-01-07", "2019-01-15")),
as.Date(c("2019-01-20", "2019-01-31"))
)
y <- iv_pairs(
as.Date(c("2019-01-01", "2019-01-03")),
as.Date(c("2019-01-04", "2019-01-08")),
as.Date(c("2019-01-07", "2019-01-09")),
as.Date(c("2019-01-10", "2019-01-20")),
as.Date(c("2019-01-15", "2019-01-20"))
)
x
#> <iv<date>[3]>
#> [1] [2019-01-05, 2019-01-10) [2019-01-07, 2019-01-15) [2019-01-20, 2019-01-31)
y
#> <iv<date>[5]>
#> [1] [2019-01-01, 2019-01-03) [2019-01-04, 2019-01-08) [2019-01-07, 2019-01-09)
#> [4] [2019-01-10, 2019-01-20) [2019-01-15, 2019-01-20)
# Find any overlap between `x` and `y`
loc <- iv_locate_overlaps(x, y)
loc
#> needles haystack
#> 1 1 2
#> 2 1 3
#> 3 2 2
#> 4 2 3
#> 5 2 4
#> 6 3 NA
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-04, 2019-01-08)
#> 2 [2019-01-05, 2019-01-10) [2019-01-07, 2019-01-09)
#> 3 [2019-01-07, 2019-01-15) [2019-01-04, 2019-01-08)
#> 4 [2019-01-07, 2019-01-15) [2019-01-07, 2019-01-09)
#> 5 [2019-01-07, 2019-01-15) [2019-01-10, 2019-01-20)
#> 6 [2019-01-20, 2019-01-31) [NA, NA)
# Find where `x` contains `y` and drop results when there isn't a match
loc <- iv_locate_overlaps(x, y, type = "contains", no_match = "drop")
loc
#> needles haystack
#> 1 1 3
#> 2 2 3
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-07, 2019-01-09)
#> 2 [2019-01-07, 2019-01-15) [2019-01-07, 2019-01-09)
# Find where `x` precedes `y`
loc <- iv_locate_precedes(x, y)
loc
#> needles haystack
#> 1 1 4
#> 2 1 5
#> 3 2 5
#> 4 3 NA
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-10, 2019-01-20)
#> 2 [2019-01-05, 2019-01-10) [2019-01-15, 2019-01-20)
#> 3 [2019-01-07, 2019-01-15) [2019-01-15, 2019-01-20)
#> 4 [2019-01-20, 2019-01-31) [NA, NA)
# Filter down to find only the closest interval in `y` of all the intervals
# where `x` preceded it
loc <- iv_locate_precedes(x, y, closest = TRUE)
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-10, 2019-01-20)
#> 2 [2019-01-07, 2019-01-15) [2019-01-15, 2019-01-20)
#> 3 [2019-01-20, 2019-01-31) [NA, NA)
# Note that `closest` can result in duplicates if there is a tie.
# `2019-01-20` appears as an end date twice in `haystack`.
loc <- iv_locate_follows(x, y, closest = TRUE)
loc
#> needles haystack
#> 1 1 1
#> 2 2 1
#> 3 3 4
#> 4 3 5
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-01, 2019-01-03)
#> 2 [2019-01-07, 2019-01-15) [2019-01-01, 2019-01-03)
#> 3 [2019-01-20, 2019-01-31) [2019-01-10, 2019-01-20)
#> 4 [2019-01-20, 2019-01-31) [2019-01-15, 2019-01-20)
# Force just one of the ties to be returned by using `multiple`.
# Here we just request any of the ties, with no guarantee on which one.
loc <- iv_locate_follows(x, y, closest = TRUE, multiple = "any")
loc
#> needles haystack
#> 1 1 1
#> 2 2 1
#> 3 3 4
iv_align(x, y, locations = loc)
#> needles haystack
#> 1 [2019-01-05, 2019-01-10) [2019-01-01, 2019-01-03)
#> 2 [2019-01-07, 2019-01-15) [2019-01-01, 2019-01-03)
#> 3 [2019-01-20, 2019-01-31) [2019-01-10, 2019-01-20)
# ---------------------------------------------------------------------------
a <- iv(NA, NA)
b <- iv(c(NA, NA), c(NA, NA))
# By default, missing intervals in `needles` are seen as exactly equal to
# missing intervals in `haystack`, which means that they overlap
iv_locate_overlaps(a, b)
#> needles haystack
#> 1 1 1
#> 2 1 2
# If you'd like missing intervals in `needles` to always be considered
# unmatched, set `missing = NA`
iv_locate_overlaps(a, b, missing = NA)
#> needles haystack
#> 1 1 NA