Temporal Reasoning
Temporal reasoning adds the native support for temporal intervals to the data. It supports the operators of DatalogMTL and includes some additional functionality.
Following restrictions have been made:
-
Joins: Only NestedLoopJoins are supported
-
Temporal Operators: You cannot nest them and write multiple rules, e.g.
B :- [-]<t1,t2><→<t3,t4> A
is forbidden and has to be writen asB1 :- <→<t3,t4> A. B :- [-]<t1,t2> B1.
Temporal Intervals
For supporting the operations, we have two types of intervals:
-
absolute (datetime) intervals of the form <YYYY-MM-DD HH:mm:ss,YYYY-MM-DD HH:mm:ss> for facts. We support absolute (integer, double) intervals for facts as alternative in case you have integers you want to work with (e.g., timestamps). You are not allowed to mix different data types of intervals in the program.
-
relative intervals of the form <x,y>, where x and y are numerics and x⇐y for operators.
Note that < is either open "(" or close "[" and > is either open ")" or close "]"
[2020-01-01,2021-01-01)
(3,8)
(2.5,4]
[2020-01-01,2020-01-31 23:59:59]
Temporal Facts
In order to make a fact temporal, one has to annotate the fact with an absolute interval.
Such a fact has the form A@<t1,t2>
A@[2020-01-01,2021-01-01).
Temporal Input/Output
When data is retrieved via or sent to record managers, one has to specify which columns should be mapped
to the temporal interval.
This is done with one the following annotations:
* @temporalMapping
annotation, one for each temporal metadata element (left/right bracket, left/right endpoint);
* @temporalMappings
annotation, one for the complete set of temporal metadata (all brackets and endpoints).
@temporalMapping (one element = one annotation)
The syntax is the following:
@temporalMapping(predicateName, position, temporalMappingType, defaultValue).
Additionally, the annotation can have a fifth, optional parameter, columnName
:
@temporalMapping(predicateName, position, temporalMappingType, defaultValue, columnName).
Where:
-
predicateName
is the name of the predicate -
position
is the column position in the record manager or-1
if the defaultValue is used or if the value does not need to be mapped to an output -
temporalMappingType
is the element of the temporal metadata the field represents. It can be:-
LEFT_ENDPOINT
-
RIGHT_ENDPOINT
-
LEFT_BRACKET
-
RIGHT_BRACKET
-
-
defaultValue
is the default value: a date, in case of the[LEFT|RIGHT]_ENDPOINT
and a boolean value (#T or #F) in case of[LEFT|RIGHT]_BRACKET
, or-1
if aposition
is provided. -
columnName
indicates the name of the column of the database to for that field (most useful in output bindings).
Please note that a value that is read as temporal metadata by the @temporalMapping
annotation will not be available for use as data.
This annotation, with a different syntax, was once called @timeMapping
. This name is no longer supported.
Input File has the following fields in this order:
ID, Name, ExtraDate, StartDate, Surname, EndDate
While Output File has the following fields in this order:
StartDate, ID, EndDate, Name, isClosedEnd
@input("a").
@bind("a","csv useHeaders=true","programs_data","time_mapping.csv").
@temporalMapping("a",3,"LEFT_ENDPOINT",-1).
@temporalMapping("a",5,"RIGHT_ENDPOINT",-1).
@temporalMapping("a",-1,"LEFT_BRACKET",#T).
@temporalMapping("a",-1,"RIGHT_BRACKET",#T).
@output("b")."
@timeMapping("b",0,2,-1,4).
@temporalMapping("b",0,"LEFT_ENDPOINT",-1).
@temporalMapping("b",2,"RIGHT_ENDPOINT",-1).
@temporalMapping("b",-1,"LEFT_BRACKET",#T).
@temporalMapping("b",4,"RIGHT_BRACKET",#T).
@bind("b","csv useHeaders=true","programs_data","time_mapping_output.csv").
b(ID,Name) :- a(ID,Name,ExtraDate,Surname).
@temporalMappings (all elements in one annotation)
The @temporalMappings
(please note the final s, making it plural) works as an alternative to @temporalMapping
in order to specify the mapping of the temporal metadata elements all in just one annotation.
@temporalMappings(predicateName, posStart, posEnd, posStartClosed, posEndClosed, defaultTemplate).
Where:
-
predicateName
is the name of the predicate -
posStart
is the column position of the left endpoint in the record manager or-1
if the defaultValue is used or if the value does not need to be mapped to an output -
posEnd
is the column position of the right endpoint in the record manager or-1
if the defaultValue is used or if the value does not need to be mapped to an output -
posStartClosed
is the column position of the left bracket in the record manager or-1
if the defaultValue is used or if the value does not need to be mapped to an output -
posEndClosed
is the column position of the right bracket in the record manager or-1
if the defaultValue is used or if the value does not need to be mapped to an output -
defaultTemplate
is a special string that allows the user to specify the defaults in form of a template in the form of(LEFT_BRACKET)(LEFT_ENDPOINT),(RIGHT_ENDPOINT)(RIGHT_BRACKET)
: -
LEFT_BRACKET
can be<
for not specified,(
for open and[
for closed -
LEFT_ENDPOINT
can be_
(not specified), a default date in theyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss
format, an int or a double -
RIGHT_ENDPOINT
can be_
(not specified), a default date in theyyyy-MM-dd
oryyyy-MM-dd HH:mm:ss
format, an int or a double -
RIGHT_BRACKET
can be>
for not specified,)
for open and]
for closed
Go-to templates for the most common cases are:
-
<_,_>
for a blank template: all thepos
values are different than -1, so we don’t need defaults -
[,]
an interval closed on both sides (where the left and right endpoint are mapped in the record manager) -
(,)
an interval open on both sides (where the left and right endpoint are mapped in the record manager) -
[,)
the interval is only left-closed (normal form) -
(,]
the interval is only right-closed
Input File has the following fields in this order:
ID, Name, ExtraDate, StartDate, Surname, EndDate
While Output File has the following fields in this order:
StartDate, ID, EndDate, Name, isClosedEnd
@input("a").
@bind("a","csv useHeaders=true","programs_data","time_mapping.csv").
@temporalMappings("a",3,5,-1,-1,"[_,_]").
@output("b")."
@timeMapping("b",0,2,-1,4).
@temporalMapping("b",0,2,-1,4,"[_,_>").
@bind("b","csv useHeaders=true","programs_data","time_mapping_output.csv").
b(ID,Name) :- a(ID,Name,ExtraDate,Surname).
@temporalType
In order to work with temporal metadata of different types, we need to specify the type in use for each programs.
This is done with the annotation @temporalType
.
@temporalType(type).
Where type
can be:
* "date" for dates. This is also the default setting: if no @temporalType
, Vadalog will assume the time is in dates;
* "int" for integers;
* "double" for doubles.
Please note that you can use only one temporal type in a program, so all temporal predicates should have the same type.
@input("a").
@bind("a","csv useHeaders=true","programs_data","time_mapping.csv").
@temporalMappings("a",3,5,-1,-1,"[_,_]").
@temporalType("int").
@output("b")."
b(ID,Name) :- a(ID,Name,ExtraDate,Surname).
@timeGranularity
This annotation allows to specify the granularity for relative validity intervals. The following possibilities are offered: - milliseconds - seconds - minutes - hours - days - months - years - no_date (use if you do not use dates for facts, but integer or doubles)
The conversion factor between days and months is an estimated average number of days per month (a year has 365.2425) Therefore, we do not recommend to mix days (and smaller time units) with months (and higher time units).
@timeGranularity("months").
Operations with granularity in Months and Years
Since the duration of months and years is not fixed (months can have 28-31 days, years can have 365-366 days), in order to make operations with the granularity of months and years possible the property temporal.standardDurations
allow to decide whether one could consider standard durations for months and years.
temporal.standardDurations=true
temporal.standardMonthDuration=30
temporal.standardYearDuration=365
If temporal.standardDurations
is set to false Vadalog Engine will launch an exception when trying to run operations on time granularity months or years.
In order to switch between different granularities of operations, you should instead use relative validity intervals in ISO 8601 durations.
ISO 8601 Relative Validity Intervals
In order not to stick to a specific granularity, and instead to be able to easily specify complex relative intervals, i.e. "6 years, 2 months, 13 days and 10 minutes", Temporal Vadalog allows the user to specify such intervals in the standard ISO 8601 Duration, preceded by a hash ('#'), in the following format:
#P[n]Y[n]M[n]DT[n]H[n]M[n]S
Where:
-
#
trailing hash to differentiate from other tokens -
P
stands for Period -
[n]Y
,[n]M
, and[n]D
are the blocks corresponding to years, months and days (date blocks).[n]
is replaced with the value of the element of date that follows (i.e. 9Y = 9 years).[n]
can be a decimal number (using either '.' or ',' as a separator). Each block is optional. -
T
stands for Time and separates the date blocks from the time blocks; it is included only if there is a time part. -
[n]H
,[n]M
, and[n]S
are the blocks corresponding to hours, minutes and seconds (time blocks). They follow the same rules as the date blocks.
[More information on ISO 8601 Duration on Wikipedia](https://en.wikipedia.org/wiki/ISO_8601#Durations)
For example, this is how we would specify "6 years, 2 months, 13 days and 10 minutes" in a relative interval:
#P6Y2M13DT10M
Some other example for ease of use:
-
#P1Y
: 1 year -
#P1M
: 1 month -
#P1D
: 1 day -
#P1Y1D
: 1 year and 1 day -
#PT1H
: 1 hour -
#PT1M
: 1 minute -
#PT1S
: 1 second -
#P1YT1D1S
: 1 year, 1 day, and 1 second -
#PT2.5H
: 2 and a half hours
Additionally, it is possible to specify relative intervals also in weeks with the week format:
#P[n]W
Where the only block present is [n]W
to specify the number of weeks.
Example, to specify 4 weeks we will use:
#P4W
The Datetime format and the Week format can be mixed in the same relative interval (i.e. [#P1D, #P1W)
),
but they cannot be mixed with the other time granularities.
@temporal Annotation
In order to limit the reasoning to a certain time interval, the @temporal
annotation is used.
@temporal
takes as parameters start and end of an interval (absolute value, same format as intervals used for facts), and when employed limits the reasoning to the interval set, returning only results that are also in that interval.
Relative:
@temporal(1,3).
Absolute:
@temporal(2021-02-01,2021-02-10).
@config Annotation
The @config
annotation allows the programmer to choose the merge strategy for intervals and other settings for the reasoning on time.
It takes two parameters: the property key and the property values.
Possible property keys:
-
"mergeTemporalDataBeforeOutput". Values:
"true"
or"false"
(default). If true, adjacent intervals are merged before producing the output. -
"mergeTemporalDataAfterInput". Values:
"true"
or"false"
(default). If true, adjacent intervals are merged directly after the input. -
"enableTemporalMerge" - Allows to enable merging for non-required operations (e.g., diamond operation). Values:
"true"
or"false"
(default). -
"discreteDomain" - In case, we work on integer domain, we can tell the engine. This allows for optimizations. At the moment only used to convert facts to closed interval representation.
-
"temporalMergeStrategy" - One of the following values can be chosen as value:
-
"default" - Uses the default value chosen by the engine (e.g. for example an all scan for the output, since we only want to produce the biggest intervals).
-
"all" - Uses the default value for the all strategy. This scan is a merge scan that merges all entries, which can be currently fetched before forwarding anything to the following scans. Then it forward directly only the biggest intervals to next scan. Note that this scan is not fully optimized yet since the change tracking is not fully integrated and depends on buffer cache filtering.
-
"forward" - Uses the default value for the forward strategy. This strategy uses the streaming mode of Vadalog as known, i.e., it reads a fact, merges it with existing read intervals and directly forwards the merged interval.
-
-
"temporalMergePlacement" - to decide where to merge intervals in the pipeline:
-
a)
"ONLY_REQUIRED"
: only when requiredONLY_REQUIRED
-
b)
"EARLIEST_WHEN_REQUIRED"
in the earliest position in the pipeline when there is an operator that needs it -
c)
"ALWAYS"
always, whether there is an operator that requires it or not, in the earliest position a merge can be done
-
@config("discreteDomain","true").
These settings default can be set in the vada.properties
file. However, as the functioning of these different configurations are heavily dependent on the program, they should remain, normally, as the following:
# TEMPORAL CONFIG
temporal.mergeTemporalDataBeforeOutput=false
temporal.enableTemporalMerge=false
temporal.temporalMergeStrategy=forward
temporal.temporalMergeStrategy.output=all
Temporal Operators
Temporal operators are applied to temporal predicates.
In general, a rule with a temporal operator has the form
B :- <-><t1,t2> A
B :- [-]<t1,t2> A
B :- <+><t1,t2> A
B :- [+]<t1,t2> A
B :- © A
where A
is a temporal predicate, and <t1,t2>
is a relative temporal interval.
The interpretation of the operators are as follows:
Closing:
-
c
: If Fact A is valid in the interval <x,y>, then B is valid in [x,y]
Forward Propagating (derive facts in the future):
-
<→
: If Fact A is valid in the interval <x,y>, then B is valid in <x+t1,y+t2> -
[-]
: If Fact A is valid in the interval <x,y>, B is valid in <x+t2,y+t1> iff x+t2 ⇐ y+t1. -
Since
B :- A1 S<t1,t2> A2
: You can define the since operator as follows:
S1 :- © A1, A2
S2 :- <-><t1,t2> S1
B :- S2, © A1
Backward Propagating (derive facts in the past):
-
<+>
: If Fact A is valid in the interval <x,y>, then B is valid in <x-t2,y-t1> -
[+]
: If Fact A is valid in the interval <x,y>, B is valid in <x-t1,y-t2> iff x-t1 ⇐ y-t2. -
Until
C :- A1 U<t1,t2> A2
: You can define the until operator as follows:
S1 :- (c) A1, A2.
S2 :- <+><t1,t2> S1.
B :- S2, (c) A1.
The endpoints are defined as follows, in case it is not infinity (always open):
* <→
: The interval is left-closed (right-closed), if the fact and the operator is left-closed (right-closed)
* <+>
: The interval is left-closed (right-closed), if the fact is left-closed (right-closed) and the operator is right-closed (left-closed)
* [-]
: The interval is left-closed (right-closed), if the fact is left-closed (right-closed) or the operator is right-open (left-open)
* [+]
: The interval is left-closed (right-closed), if the fact is left-closed (right-closed) or the operator is left-open (right-open)
* (c)
: The interval is left-closed and right-closed
a(1)@(2020-02-10,2020-03-11].
b(X) :- <->(3,7.5] a(X).
c(X) :- [-](2,4] a(X).
@output("b").
@output("c").
@timeGranularity("days").
The above example outputs b@(2020-01-04,2020-01-13 12:00:00)
and c@[2020-01-05,2020-01-08]
.
Additional Operations
In addition, we support the following operator for facts of type date:
Triangle Up Operator
B :-/\{x} {units} [offset] A
where x
is a number and units
is a time granularity (plural) or no_date
in case of wanting to use the operator to aggregate temporal data with integers or doubles.
This operator extends the interval to the length of the provided unit. offset
offsets the grouping of time units with respect to the default one. (See Offset for more info.)
In case x > 1, then it extends the interval to a multiple of it (e.g., to handle periods) For example, 4 months creates the intervals (Jan, Feb, Mar, Apr), (May, Jun, Jul, Aug) (Sep, Oct, Nov, Dec) and 3 months creates the intervals (Jan, Feb, Mar), (Apr, May, Jun), (Jul, Aug, Sep), (Oct, Nov, Dec) and must be a divisor of the time granularity.
That is for:
-
years: any value >= 1
-
months: 1, 2, 3, 4, 6, 12
-
days: 1
-
hours: 1, 2, 3, 4, 6, 8, 12, 24
-
minutes: divisors of 60
-
seconds: divisors of 60
B :-/\3 months A
Offset
To make the operator more flexible to a variety of scenarios, it is possible to tell the operator to offset the default intervals. For example, to make the 3-months intervals (Mar-May),(Jun-Aug) instead of (Jan-Mar) and (Apr-Jun), we will use the offset 2.
B :-/\3 months 2 A
a(1,2)@(2020-02-10,2020-03-11].
c(X,Y) :- /\1 months a(X,Y).
d(X,Y) :- /\4 months a(X,Y).
@output(\"c\").
@output(\"d\").
@timeGranularity(\"days\").
The above example outputs c(1,2)@[2020-02-01,2020-04-01)
and d(1,2)@[2020-01-01,2020-05-01)
.
Triangle Down Operator
B :- \/{x} units [offset] A
where x
is a number and units is a time granularity or no date
in case the temporal type is integer or double.
This operator reduces the interval to the length of the provided unit. offset
offsets the grouping of time units with
respect to the default one. (See Offset for more info.)
If the Triangle Up can be seen as a sort of ceiling operator, the Triangle Down is the corresponding flooring operator: it returns atoms with the largest interval made of blocks of x units included in the interval provided by the original atoms.
It is useful, for example, to get the full trimesters covered in a large interval of time.
In case x > 1, then it reduces the interval to a multiple of it (e.g., to handle periods) For example, 4 months creates the intervals (Jan, Feb, Mar, Apr), (May, Jun, Jul, Aug) (Sep, Oct, Nov, Dec) and 3 months creates the intervals (Jan, Feb, Mar), (Apr, May, Jun), (Jul, Aug, Sep), (Oct, Nov, Dec) and must be a divisor of the time granularity.
That is for:
-
years: any value >= 1
-
months: 1,2,3,4,6,12
-
days: 1
-
hours: 1,2,3,4,6,8,12,24
-
minutes: divisors of 60
-
seconds: divisors of 60
a(1,2)@(2020-02-10,2020-03-11].
c(X,Y) :- \/1 months a(X,Y).
d(X,Y) :- \/4 months a(X,Y).
@output(\"c\").
@output(\"d\").
@timeGranularity(\"days\").
The above example outputs c(1,2)@[2020-02-01,2020-04-01)
and d(1,2)@[2020-01-01,2020-05-01)
.
Offset
To make the operator more flexible to a variety of scenarios, it is possible to tell the operator to offset the default intervals. For example, to make the 3-months intervals (Mar-May),(Jun-Aug) instead of (Jan-Mar) and (Apr-Jun), we will use the offset 2.
Usage:
B :-\/3 months 2 A
a(1,2)@(2020-03-10,2021-02-11].
c(X,Y) :- /\3 months a(X,Y).
d(X,Y) :- /\3 months 1 a(X,Y).
@output(\"c\").
@output(\"d\").
@timeGranularity(\"days\").
The above example outputs c(1,2)@[2020-04-01,2021-01-01)
and d(1,2)@[2020-05-01,2021-02-01)
.
Conversion Between Temporal And Non-Temporal Reasoning
@temporalAtom
In case you want to manipulate the time outside of temporal reasoning, it is possible to convert
temporal predicates to non-temporal ones and non-temporal predicates to temporal ones.
For this you annotate the predicate with temporalAtom
:
Unwrapping (Temporal to Non-Temporal):
b(X,A,B,C,D) :- a(X)@temporalAtom(A,B,C,D).
Wrapping (Non-Temporal to Temporal):
d(X)@temporalAtom(A,B,C,D) :- c(X,A,B,C,D).