Temporal Notes
Temporal Notes
Introduction Time Ontology Temporal Conceptual Modeling Manipulating Temporal Databases with SQL-92 Summary
Temporal DBMS
Provide mechanisms to store and query time-varying information Example: Media Planning 3 Which advertisements are to appear? 3 When should the advertisements appear? 3 What is the advertising budget for the dierent media? Applications with temporal aspects abound
Applications
Academic 3 Register the courses taken by students in previous and current semesters, the grades for previous semesters
Accounting 3 What bills were sent out and when, what payments were received and when? 3 Cash ow over time 3 Money-management software show, e.g., account balance over time Budget 3 Previous and projected budgets Data warehousing 3 Historical trend analysis for decision support
Applications, cont.
Financial 3 Stock market data 3 Audit analysis: why were nancial decisions made, and with what information available? Geographical Information Systems 3 Land use over time: boundary of parcels change over time, as parcels get partitioned and merged 3 Road planning 3 Deforestation trends Insurance 3 Which police was in eect at each point in time, and what time periods did that police cover? Planning
Applications, cont.
3 Use of present and past schedules for designing new schedules 3 Network management Process monitoring 3 Chemical, electrical, nuclear power installations Reservation systems 3 Hotels, trains, airlines 3 Conguration of new routes Scientic databases 3 Recording experiments 3 Dating archeological ndings 3 Timestamping satellite images
Applications, cont.
Inventory 3 Inventory over time, for analysis, accounting Law 3 Validity period for laws Medical Records 3 Patient records, drug regimes, lab tests 3 Tracking course of disease 3 Epidemiology Payroll 3 Past employees, employee salary history, salaries for future months, records of withholding requested by employees Planning 3 Attribution of tasks, schedules
Applications: Conclusion
Dicult to identify applications not needing management of temporal data These applications would benet from built-in temporal support in the DBMS 3 More ecient application development 3 Potential increase of performance
Case Study
Personnel management in a database Employee(Name, Salary, Title) It is easy to know the salary of an employee SELECT Salary FROM Employee WHERE Name = John It is necessary to add date of birth Employee(Name, Salary, Title, BirthDate DATE) It is also easy to know the date of birth of an employee SELECT BirthDate FROM Employee WHERE Name = John
11
For the data model, new columns are identical to attribute BirthDate
10
12
SQL Code
CREATE TABLE Temp(Salary, FromDate, ToDate) AS SELECT Salary, FromDate, ToDate FROM Employee WHERE Name = John UPDATE Temp T1 SET (T1.ToDate) = (SELECT MAX(T2.ToDate) FROM Temp AS T2 WHERE T1.Salary = T2.Salary AND T1.FromDate < T2.FromDate AND T1.ToDate >= T2.FromDate AND T1.ToDate < T2.ToDate) WHERE EXISTS ( SELECT * FROM Temp as T2 WHERE T1.Salary = T2.Salary AND T1.FromDate < T2.FromDate AND T1.ToDate >= T2.FromDate AND T1.ToDate < T2.ToDate) until no tuples updated repeat
13
15
14
16
t2
%
Linked list is not necessary in this case if cursor is ORDER BY Salary, FromDate Alternative 4: Use the transitive closure or triggers in SQL3 TSQL2 SELECT Salary FROM Employee WHERE Name = Bob
t2
{f.FromDate, l.ToDate | Temp(f ) Temp(l) f.FromDate < l.ToDate f.Salary = l.Salary (t)(Temp(t) t.Salary = f.Salary f.FromDate < t.FromDate t.FromDate < l.ToDate (t1 )(Temp(t1 ) t1 .Salary = f.Salary t1 .FromDate < t.FromDate t.FromDate <= t1 .ToDate)) (t2 )(t2 .Salary = f.Salary ( (t2 .FromDate < f.FromDate f.FromDate <= t2 .ToDate) (t2 .FromDate < l.ToDate l.ToDate < t2 .ToDate)) ) }
17
19
Another Possibility
Alternative 3: Use SQL to open a cursor in the table
Maintain a linked list of intervals, for each salary; Initialize this linked list to empty; DECLARE emp_cursor CURSOR FOR SELECT Salary, FromDate, ToDate FROM Employee WHERE Name = Bob OPEN emp_cursor; loop: FETCH emp_cursor INTO :salary, :FromDate, :ToDate; if no-data returned, then go to finish; find position in list to insert this information go to loop; finish: CLOSE emp_cursor iterate through linked list for printing dates and salary
18
20
Employee1 1 Employee2 Salary Title FromDate 60.000 Assistant 1/1/95 70.000 Assistant 1/6/95 70.000 Lecturer 1/10/95 70.000 Professor 1/2/96
21
23
22
24
Introduction: Summary
Applications managing temporal data abound Classical DBMS are not adequate If a temporal DBMS is used 3 Schemas are simpler 3 SQL queries are much simpler 3 Much less procedural code is necessary Benets 3 Application code is less complex Easier to understand, to produce, to ensure correctness, to maintain 3 Performance may be increased by relegating functionality to DBMS Notions of time 3 Structure 3 Density 3 Boundedness TSQL2 time ontology Time data types Clocks Times and facts
Time Ontology
25
27
Time Structure
Linear: total order on instants -
6 now
y Time Ontology
Temporal Conceptual Modeling Manipulating Temporal Databases with SQL-92
Introduction
6 now
Directed Acyclic Graph Periodic/cyclic time: weeks, months, . . ., for recurrent processes
26
28
Boundedness of Time
Assume a linear time structure Boundedness 3 Unbounded 3 Time origin exists (bounded from the left) 3 Bounded time (bounds on two ends) Nature of bound 3 Unspecied 3 Specied Physicists believe that the universe is bounded by the Big Bang (12-18 billions years ago) and by the Big Crunch (? billion years in the future)
29
31
Time Density
Discrete 3 Time line is isomorphic to the integers 3 Time line is composed of a sequence of non-decomposable time periods, of some xed minimal duration, termed chronons 3 Between each pair of chronons is a nite number of other chronons Dense 3 Time line is isomorphic to the rational numbers 3 Innite number of instants between each pair of chronons Continuous 3 Time line is isomorphic to the real numbers 3 Innite number of instants between each pair of chronons Distance may optionally be dened
30
32
Clocks
A clock is a physical process coupled with a method of measuring that process Units of measurement are the chronons of the clock Examples 3 Year clocks 3 Day clocks 3 Second clocks 3 Other clocks
33
35
34
36
May be modied Used for static queries What is Johns title? SELECT Title FROM Faculty WHERE Name = John
Append-only: correction to previous snapshot states is not permitted Allow retrospective queries (rollback) What did we believe Johns rank was on October 1st, 1984? SELECT Title FROM Faculty WHERE Name = John AND TRANSACTION(Faculty) OVERLAPS DATE 01-10-1984
37
39
...
John Assistant 1-11-87 John 1st Assistant 1-12-87
Jan. 84
Dec. 87
March 89
July 89
On January 1st, 1984, John is hired as assistant On December 1st, 1987, John nishes his doctorate and is promoted as 1st Assistant retroactively on July 1st, 1987 On March 1st, 1989, John is promoted as Lecturer, proactively on July 1st, 1989
...
38
40
Bitemporal Tables
Valid Time
Transaction Time
Append-only May be modied Allow historical queries What was Johns title on October 1st, 1984 (as best known)? SELECT Title FROM Faculty WHERE Name = John AND VALID(Faculty) OVERLAPS DATE 01-10-1984 41 Transaction and valid time Allow coupled historical and retrospective queries On October 1st, 1984, what did we think Johns rank was at that date?
SELECT Title FROM Faculty AS E WHERE Name = John AND VALID(E) OVERLAPS DATE 01-10-1984 AND TRANSACTION(E) OVERLAPS DATE 01-10-1984
43
John Titles 1st Assistant July 1987 Assistant January 1984 John Titles Lecturer July 1989 1st Assistant July 1987 Assistant January 1984
42
44
45
47
46
48
Temporal Requirements
Valid time, transaction time, user-dened time Imprecise, future, relative, branching time Time series Integration of DBs with dierent granularities Coexistence with non-temporal data Legacy applications Open architecture Temporal reasoners
49
51
Major Conclusions
there seems to be a gap between the goals assumed by the temporal DB community and the needs... users could not say what a temporal database is the glossary was couched in the language of temporal DB researchers the time-varying semantics is obscured in the representation schemes by other considerations of presentation and implementation we therefore advocate a separation of concerns, i.e. adopting a very simple conceptual data model ....
Interaction Requirements
Graphical information 3 information visualization 3 graphical queries Multiple users, multiple needs, multiple functions
50
52
Practical Requirements
Huge data sets 3 Collecting new data is expensive 3 Reusing highly heterogeneous existing data sets is a must ... but is very dicult ! Integration requires understanding, hence a conceptual model
Simple (understandable) data model 3 few clean concepts, with standard, well-known semantics No articial time objects Time orthogonal to data structures Various granularities, Clean, visual notations Intuitive icons / symbols
53
55
Counter Examples
ENDstamp Works for Id Project
Salary
Employee
54
56
Orthogonality
Employee EName WorksOn Project P# EmpDep Department D#
Temporal relationships 3 transformation / generation of objects 3 temporal links 3 time-based aggregations 3 Temporal integrity constraints
EmpDep
Department D#
57
59
58
60
Temporal Objects
name birthdate address salary projects [7/94-6/96] [7/97-6/98] active [7/96-6/97] suspended
Non-Temporal Objects ?
No life cycle, or Default life cycle 3 ( active, [0, now] ) 3 ( active, [now, now] ) 3 ( active, [0, ] ) Coexistence 3 temporal non temporal (snapshot) 3 non-temporal temporal (default life cycle)
Employee
e221
61
63
TSQL2 Policy
Temporal operators not allowed on non-temporal relations 3 no life cycle Joins between temporal and non-temporal relations are allowed 3 default life cycle = ( active, [0, ] ) SELECT Department.Name, COUNT (PID) FROM Department, Employee WHERE Employee.dept # = Department.dept # AND VALID(Employee) OVERLAPS PERIOD [1/1/96-31/12/96] GROUP BY dept #
62
64
Temporal Attributes
o2 Peter 8/9/64 Bd St Germain Bd St Michel Rue de la Paix 4000 5000 [7/94-7/98] [1/85-12/87] [1/88-12/94] [1/95-now]
Employee
Laboratory
Updating manager add element to manager history Updating projName (name of project has changed) update name
Updating project (laboratory changed project) update name, start new history for manager
65
67
Laboratory
66
68
Temporary
Permanent
Temporary and Permanent are implicitly temporal 3 they inherit their life cycle from Employee
69
71
Temporal Generalization
Person name birthdate address salary projects o2 Peter 8/9/64 East Terrace Flinders Street Person
Temporary Student
Permanent Faculty
[3/93-2/95] [3/95-now]
[1/87-6/94] [7/94-now]
Student and Faculty have two life cycles: 3 an inherited one (the one of Person) 3 a redened one (the one of Student/Faculty )
[7/94-now] 4000 [7/94-7/95] 5000 [8/95-now] {MADS} [7/94-8/95] {MADS, HELIOS} [9/95-now]
Employee
The redened life cycle has to be included in the one of the corresponding Person 3 lifespan and active periods
70
72
(e2, p2,
73
75
e2
25 [x/x/x - x/x/x] (e1, p1, 35 ) [x/x/x - x/x/x] [x/x/x - x/x/x] 25 [x/x/x - x/x/x] (e2, p2, ) 35 [x/x/x - x/x/x]
74
76
Dynamic Relationships
Express processes or time links Transformation of an object 3 a student becomes an alumnus Generation of objects 3 a parcel is split into several parcels Temporal relationships between objects 3 ancestor Coalescence 3 an object and its versions / snapshots
77
79
Student
Alumnus
Student.status Alumnus.status
A|S|D
S|? A
A (set of) source object(s) generates a (set of) target object(s): yields
78
80
81
83
82
84
Coalescence (1)
0:n River name flowrate name flowrate Versions C 1:1 RiverBed name year averageflow name year averageflow
A temporal object linked to its (non-temporal) versions: has versions Attributes may be derived
85
87
Coalescence (2)
They are not temporal They are aggregation relationships 3 they bear cardinalities 3 they can have attributes (temporal or not) Introduction Time Ontology
86
88
ToDate
Constraint: Employees have only one position at a point in time In the corresponding non-temporal table the key is (SSN,PCN)
Incumbents is a valid-time table 3 FromDate indicates when the information in the row is valid, i.e. when the employee was assigned to that position 3 ToDate indicates when the information in the row was no longer valid Data type for periods is not available in SQL-92 a period is simulated with two Date columns
Candidate keys on Incumbents: (SSN,PCN,FromDate), (SSN,PCN,ToDate), and (SSN,PCN,FromDate,ToDate) None captures the constraint: there are overlapping periods associated with the same SSN What is needed: sequenced constraint, applied at each point in time All constraints specied on a snapshot table have sequenced counterparts, specied on the analogous valid-time table
89
91
Special date 3000-01-01 denotes currently valid Closed-open periods used, e.g., validity of rst tuple is [1996-01-01,1996-06-01) Table can be viewed as a compact representation of a sequence of snapshot tables, each valid on a particular day Constraint: Employees does not have gaps in their position history Last two rows may be replaced with a single row valid at [1996-06-01,3000-10-01)
90
92
I1.FromDate < I2.ToDate AND I2.FromDate < I1.ToDate: test overlaping on two periods If a closed-closed representation for the period of validity is used, the predicate must be changed by I1.FromDate <= I2.ToDate AND I2.FromDate <= I1.ToDate: COUNT: ensure that I1 and I2 are not the same row 1 2 3 4 5
Types of Duplicates
Incumbents SSN 111223333 111223333 111223333 111223333 111223333 PCN 120033 120033 120033 120033 120033 FromDate 1996-01-01 1996-04-01 1996-04-01 1996-10-01 1997-12-01 ToDate 1996-06-01 1996-10-01 1996-10-01 1998-01-01 1998-01-01
Handling Now
What should the timestamp be for current data ? One alternative: using NULL Allows to indentify current records: WHERE Incumbents.ToDate IS NULL Disadvantages 3 users get confused with a data of NULL 3 in SQL any comparision with a null value returns false rows with null values will be absent from the result of many queries 3 other uses of NULL are not available Another approach: set the end date to largest value in the timestamp domain, e.g., 3000-01-01 Disadvantages 3 DB states that something will be true in the far future 3 represent now and forever in the same way 93
Two rows are value equivalent if the values of their nontimestamp columns are equivalent Two rows are sequenced duplicates if they are duplicates at some instant: 1+2 employee has to Positions for the months of April and May of 1996 Two rows are current duplicates if they are sequenced duplicates at the current instant: 4+5 in December 1997 a current duplicate will suddenly appear Two rows are nonsequenced duplicates if the values of all columns are identical: 2+3 94
95
Uniqueness (2)
Nonsequenced constraint: an employee cannot have more than one position over two identical periods, i.e., Incumbents.SSN is nonsequenced unique: UNIQUE(SSN,FromDate,ToDate) Current constraint: an employee has at most one position, i.e., Incumbents.SSN is current unique: CREATE TRIGGER Seq_Primary_Key ON Incumbents FOR INSERT, UPDATE, DELETE AS IF EXISTS ( SELECT I1.SSN FROM Incumbents AS I1 WHERE 1 < ( SELECT COUNT(I2.SSN) FROM Incumbents AS I2 WHERE I1.SSN = I2.SSN AND I1.FromDate < I2.ToDate AND I2.FromDate < I1.ToDate ) ) BEGIN RAISERROR(Transaction violates sequenced constraint,1,2) rollback transaction END
98
Uniqueness (1)
Constraint: Each employee has at most one position Snapshot table: UNIQUE(SSN) Sequenced constraint: At any time each employee has at most one position, i.e., Incumbents.SSN is sequenced unique CREATE TRIGGER Seq_Primary_Key ON Incumbents FOR INSERT, UPDATE, DELETE AS IF EXISTS ( SELECT I1.SSN FROM Incumbents AS I1 WHERE 1 < ( SELECT COUNT(I2.SSN) FROM Incumbents AS I2 WHERE I1.SSN = I2.SSN AND I1.FromDate < I2.ToDate AND I2.FromDate < I1.ToDate ) ) OR EXISTS ( SELECT * FROM Incumbents AS I WHERE I.SSN IS NULL ) BEGIN RAISERROR(Transaction violates sequenced constraint,1,2) rollback transaction END
... )
Case 2: Both tables are temporal The PCN of all current incumbents must be listed in the current Positions CREATE TRIGGER Current_Referential_Integrity ON Incumbents FOR INSERT, UPDATE, DELETE AS IF EXISTS ( SELECT * FROM Incumbents AS I WHERE I.ToDate = 3000-01-01 AND NOT EXISTS ( SELECT * FROM Positions AS P WHERE I.PCN = P.PCN AND P.ToDate = 3000-01-01 ) ) BEGIN RAISERROR(Violation of current referential integrity,1,2) ROLLBACK TRANSACTION END
97
99
100
102
Contiguous History
Incumbents.PCN denes a contiguous history CREATE TRIGGER Contiguous_History ON Positions FOR INSERT, UPDATE, DELETE AS IF EXISTS ( SELECT * FROM Positions AS P1, Positions AS P2 WHERE P1.PCN = P2.PCN AND P1.ToDate < P2.FromDate AND NOT EXISTS ( SELECT * FROM Positions AS P3 P1 P2 WHERE P3.PCN = P1.PCN R R AND ( ( P3.FromDate <= P1.ToDate P3 P3 AND P1.ToDate < P3.ToDate ) OR ( P3.FromDate < P2.FromDate AND P2.FromDate <= P3.ToDate ) ) ) ) BEGIN RAISERROR(Transaction violates contiguous history,1,2) ROLLBACK TRANSACTION END This is a nonsequenced constraint: it require examining the table at multiple points of time
101
103
ToDate
As for constraints, queries and modications can be of three kinds: current, sequenced, and nonsequenced Extracting the current state: What is Bobs current position SELECT JobTitle FROM Employees, Incumbents, Positions WHERE FirstName = Bob AND Employees.SSN = Incumbents.SSN AND Incumbents.PCN = Positions.PCN AND ToDate = 3000-01-01
104
106
105
107
Sequenced Queries
Queries whose result is a valid-time table Use sequenced variants of basic operations: selection, projection, union, sorting, join, dierence, and duplicate elimination Sequenced selection: no change is necessary Who makes or has made more than 50K annually SELECT * FROM Salary WHERE Amount > 50000 Sequenced projection: include the timestamp columns in the select list List the social security numbers of current and past employees SELECT SSN, FromDate, ToDate FROM Salary Duplications resulting from the projection are retained To eliminate them coalescing is needed (see next)
108
110
L T1
)
Sequenced Sort
Requires the result to be ordered at each point in time This can be accomplished by appending the start and end time columns in the ORDER BY clause Sequenced sort Incumbents on the position code (rst version) SELECT * FROM Incumbents ORDER BY PCN, FromDate, ToDate Sequenced sorting can also be accomplished by omitting the timestamp columns SELECT * FROM Incumbents ORDER BY PCN
Select those start and end dates such that 3 there are not gaps between these dates 3 no value-equivalent row overlaps the period between the selected start and end dates and has an earlier start date or a later end date Search two value-equivalent rows F(irst) and L(ast) dening the start and end points of a coalesced row First NOT EXISTS ensures that there are no gaps between F.ToDate and L.FromDate Second NOT EXIST ensures that only maximal periods result, i.e. F and L cannot be part of a larger value-equivalent row T2
109
111
S I
S I
S I
112
114
Department head
Four possible cases should be taken into account Each of these cases requires a separate SELECT statement in the sequenced version
116
118
Dierence
Implemented in SQL with EXCEPT, NOT EXISTS, or NOT IN List the employees who are department heads (PCN=455332) but are not also professors (PCN=821197): nontemporal version SELECT SSN FROM Incumbents I1 WHERE I1.PCN = 455332 AND NOT EXISTS ( SELECT * FROM Incumbents I2 WHERE I1.SSN = I2.SSN AND I2.PCN = 821197 ) Using EXCEPT (not available in SQL Server) SELECT SSN FROM Incumbents WHERE PCN = 455332 EXCEPT SELECT SSN FROM Incumbents WHERE PCN = 821197 Sequenced version: Identify when the department heads were not professors
I1 DH I2 Pr. )
117
119
Eliminating Duplicates
Remove nonsequenced duplicates from Incumbents SELECT DISTINCT * FROM Incumbents Remove value-equivalent rows from Incumbents SELECT DISTINCT SSN,PCN FROM Incumbents Remove current duplicates from Incumbents SELECT DISTINCT SSN,PCN FROM Incumbents WHERE ToDate = 3000-01-01
120
122
Nonsequenced Variants
Nonsequenced operators (selection, join , . . .) are straightforward: they ignore the time-varying nature of tables List all the salaries, past and present, of employees who had been a hazardous waste specialist at some time SELECT Amount FROM Incumbents, Positions, Salary WHERE Incumbents.SSN = Salary.SSN AND Incumbents.PCN = Positions.PCN AND JobTitle = 20730 When did employees receive raises? SELECT S2.SSN, S2.FromDate AS RAISE_DATE FROM Salary AS S1, Salary AS S2 WHERE S2.Amount > S1.Amount AND S1.SSN = S2.SSN AND S1.ToDate = S2.FromDate
SQL provides aggregation functions: COUNT, MIN, MAX, AVG, . . . List the maximum salary: non-temporal version SELECT MAX(Amount) FROM Salary List by department the maximum salary: non-temporal version SELECT DNumber, MAX(Amount) FROM Affiliation A, Salary S WHERE A.SSN = S.SSN GROUP BY DNumber
121
123
First step: Compute the periods on which a maximum must be calculated CREATE VIEW SalChanges(Day) as SELECT DISTINCT FromDate FROM Salary UNION SELECT DISTINCT ToDate FROM Salary CREATE VIEW SalPeriods(FromDate, ToDate) as SELECT P1.Day, P2.Day FROM SalChanges P1, SalChanges P2 WHERE P1.Day < P2.Day AND NOT EXISTS ( SELECT * FROM SalChanges P3 WHERE P1.Day < P3.Day AND P3.Day < P2.Day )
Second step: Compute the number of employees for these periods CREATE VIEW TempCount(NbEmp, FromDate, ToDate) as SELECT COUNT(*), P.FromDate, P.ToDate FROM Salary S, SalPeriods P WHERE S.FromDate<=P.FromDate AND P.ToDate<=S.ToDate GROUP BY P.FromDate, P.ToDate UNION ALL SELECT 0, P.FromDate, P.ToDate FROM SalPeriods P WHERE NOT EXISTS ( SELECT * FROM Salary S WHERE S.FromDate<=P.FromDate AND P.ToDate<=S.ToDate ) Third step: Coalesce the above view (as seen before)
124
126
Second step: Compute the maximum salary for these periods CREATE VIEW TempMax(MaxSalary, FromDate, ToDate) as SELECT MAX(E.Amount), I.FromDate, I.ToDate FROM Salary E, SalPeriods I WHERE E.FromDate <= I.FromDate AND I.ToDate <= E.ToDate GROUP BY I.FromDate, I.ToDate Third step: Coalesce the above view (as seen before)
25 D2 30
D1 35
20 25 35 35 25 30 30 30
Hypothesis: Employees have salary only while they are aliated to a department
125
127
FromDate
ToDate ToDate
FromDate
FromDate
ToDate
Implemented in SQL with two nested NOT EXISTS List the employees that work in all projects of the department to which they are aliated: non-temporal version SELECT SSN FROM Affiliation A WHERE NOT EXISTS ( SELECT * FROM Controls C WHERE A.DNumber = C.DNumber AND NOT EXISTS ( SELECT * FROM WorksOn W WHERE C.PNumber = W.PNumber AND A.SSN = W.SSN ) )
128
130
W2 W1 Result
E,P1
%!%
E,P2
CREATE VIEW TempUnivQuant(SSN, FromDate, ToDate) as SELECT DISTINCT W1.SSN, W1.FromDate, W2.ToDate FROM WorksOn W1, WorksOn W2, Affiliation A WHERE W1.SSN = W2.SSN AND W1.SSN = A.SSN AND W1.FromDate < W2.ToDate AND NOT EXISTS ( SELECT * FROM Controls C WHERE A.DNumber = C.DNumber AND NOT EXISTS ( SELECT * FROM WorksOn W WHERE C.PNumber = W.PNumber AND A.SSN = W.SSN AND W.FromDate <= W1.FromDate AND W2.ToDate <= W.ToDate ) )
!!%!!%
132
134
A W W ProjChanges
%%!%
133
135
A C1 C2 W1 W2 Result
!%!%! ! %
136
138
137
139
140