06 Transactional Recovery

Transactional Recovery: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-ahead Logging

1. EECS 262a Advanced Topics in Computer Systems Today’s Paper Lecture 6 • ARIES: A Transaction Recovery Method Supporting Fine- Granularity Locking and Partial Rollbacks Using Write-ahead ARIES: Logging and Recovery Logging, C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh and Slides derived from Joe Hellerstein; Peter Schwarz. Appears in Transactions on Database Updated by A. Fekete Systems, Vol 17, No. 1, March 1992, Pages 94-162 If you are going to be in the logging • Thoughts? business, one of the things that you have to do is to learn about heavy • Huge state-of-the-art/historical survey (Ch. 10) equipment. - Robert VanNatta, Logging History of Columbia County 9/11/2018 Cs262a-F18 Lecture-06 2 Review: The ACID properties Motivation • Atomicity: • Atomicity: All actions in the Transaction happen, or – Transactions may abort (“Rollback”) none happen • Durability: • Consistency: If each Transaction is consistent, and the – What if DBMS stops running? (Causes?) DB starts consistent, it ends up consistent • Isolation: Execution of one Transaction is isolated  Desired Behavior after system restarts: crash! from that of other Transactions T1 – T1, T2 & T3 should be • Durability: If a Transaction commits, its effects persist durable T2 T3 – T4 & T5 should be T4 • The Recovery Manager guarantees Atomicity & Durability aborted (effects not seen) T5 9/11/2018 Cs262a-F18 Lecture-06 3 9/11/2018 Cs262a-F18 Lecture-06 4

2. Intended Functionality Assumptions • At any time, each data item contains the value • Essential concurrency control is in effect produced by the most recent update done by a – For read/write items: Write locks taken and held till transaction that committed commit • E.g., Strict 2PL, but read locks not important for recovery – For more general types: operations of concurrent transactions commute • Updates are happening “in place” – i.e. data is overwritten on (deleted from) its location • Unlike multiversion (e.g., shadow pages) approaches • Buffer in volatile memory; data persists on disk 9/11/2018 Cs262a-F18 Lecture-06 5 9/11/2018 Cs262a-F18 Lecture-06 6 Challenge: REDO Challenge: UNDO Action Buffer Disk Action Buffer Disk Initially 0 Initially 0 T1 writes 1 1 0 T1 writes 1 1 0 T1 commits 1 0 Page flushed 1 CRASH 0 CRASH 1 • Need to restore value 1 to item • Need to restore value 0 to item – Last value written by a committed transaction – Last value from a committed transaction 9/11/2018 Cs262a-F18 Lecture-06 7 9/11/2018 Cs262a-F18 Lecture-06 8

3. Handling the Buffer Pool More on Steal and Force • STEAL (why enforcing Atomicity is hard) • Can you think of a simple scheme – To steal frame F: Current page in F (say P) is to guarantee Atomicity & written to disk; some Transaction holds lock on P Durability? • What if the Transaction with the lock on P aborts? • Force write to disk at commit? • Must remember the old value of P at steal time (to No Steal Steal – Poor response time support UNDOing the write to page P) – But provides durability Force Trivial • NO FORCE (why enforcing Durability is hard) • No Steal of buffer-pool frames from – What if system crashes before a modified page is uncommited Transactions (“pin”)? written to disk? – Poor throughput No Force Desired – Write as little as possible, in a convenient place, at – But easily ensure atomicity commit time, to support REDOing modifications 9/11/2018 Cs262a-F18 Lecture-06 9 9/11/2018 Cs262a-F18 Lecture-06 10 Basic Idea: Logging Write-Ahead Logging (WAL) • Record REDO and UNDO information, for every • The Write-Ahead Logging Protocol: update, in a log 1. Must force the log record for an update before the – Sequential writes to log (put it on a separate disk) corresponding data page gets to disk – Minimal info (diff) written to log, so multiple updates fit 2. Must write all log records for a Transaction before in a single log page commit • Log: An ordered list of REDO/UNDO actions • #1 (undo rule) allows system to have Atomicity – Log record contains: <XID, pageID, offset, length, old data, new data> • #2 (redo rule) allows system to have Durability – and additional control info (which we’ll see soon) – For abstract types, have operation(args) instead of old value new value 9/11/2018 Cs262a-F18 Lecture-06 11 9/11/2018 Cs262a-F18 Lecture-06 12

4. ARIES Key ideas of ARIES • Exactly how is logging (and recovery!) done? • Log every change (even UNDOs during – Many approaches (traditional ones used in Transaction abort) relational systems of 1980s) • In restart, first repeat history without – ARIES algorithms developed by IBM used many backtracking of the same ideas, and some novelties that were – Even REDO the actions of loser transactions quite radical at the time • Then UNDO actions of losers • Research report in 1989; conference paper on an extension in 1989; comprehensive journal publication in 1992 • LSNs in pages used to coordinate state between • 10 Year VLDB Award 1999 log, buffer, disk Novel features of ARIES in italics 9/11/2018 Cs262a-F18 Lecture-06 13 9/11/2018 Cs262a-F18 Lecture-06 14 DB RAM WAL & the Log LSNs pageLSNs flushedLSN WAL constraints • Each log record has a unique Log Sequence Number (LSN) Log records • Before a page is written, – LSNs always increasing flushed to disk – pageLSN flushedLSN • Each data page contains a • Commit record included in log; all related update pageLSN log records precede it in log – The LSN of the most recent log record for an update to that page pageLSN “Log tail” in RAM • System keeps track of flushedLSN – The max LSN flushed so far 9/11/2018 Cs262a-F18 Lecture-06 15 9/11/2018 Cs262a-F18 Lecture-06 16

5. Log Records Other Log-Related State Possible log record types: LogRecord fields: • Update • Transaction Table: prevLSN • Commit – One entry per active Transaction XID • Abort – Contains XID, status (running/commited/aborted), type • End (signifies end of commit and lastLSN pageID or abort) update length • Compensation Log Records • Dirty Page Table: records offset (CLRs) – One entry per dirty page in buffer pool only before-image – for UNDO actions – Contains recLSN – the LSN of the log record which after-image – (and some other tricks!) first caused the page to be dirty 9/11/2018 Cs262a-F18 Lecture-06 17 9/11/2018 Cs262a-F18 Lecture-06 18 Normal Execution of a Transaction Checkpointing • Periodically, the DBMS creates a checkpoint, in order • Series of reads & writes, followed by commit or abort to minimize the time taken to recover in the event of a – We will assume that page write is atomic on disk system crash. Write to log: • In practice, additional details to deal with non-atomic writes – begin_checkpoint record: Indicates when chkpt began. – end_checkpoint record: Contains current Transaction • Strict 2PL (at least for writes) table and dirty page table. This is a `fuzzy checkpoint’: • Other Transactions continue to run; so these tables only known to reflect some mix of state after the time of the • STEAL, NO-FORCE buffer management, with Write- begin_checkpoint record. Ahead Logging • No attempt to force dirty pages to disk; effectiveness of checkpoint limited by oldest unwritten change to a dirty page. (So it’s a good idea to periodically flush dirty pages to disk!) – Store LSN of chkpt record in a safe place (master 9/11/2018 Cs262a-F18 Lecture-06 19 9/11/2018 record) Cs262a-F18 Lecture-06 20

6. The Big Picture: What’s Stored Where Simple Transaction Abort • For now, consider an explicit abort of a LOG RAM DB Transaction – No crash involved LogRecords prevLSN Transaction Table XID Data pages lastLSN • We want to “play back” the log in reverse order, type each status UNDOing updates. with a pageID – Get lastLSN of Transaction from Transaction table length pageLSN Dirty Page Table recLSN – Can follow chain of log records backward via the offset master record prevLSN field before-image after-image flushedLSN – Note: before starting UNDO, could write an Abort log record • Why bother? 9/11/2018 Cs262a-F18 Lecture-06 21 9/11/2018 Cs262a-F18 Lecture-06 22 Abort, cont. Transaction Commit • Write commit record to log • To perform UNDO, must have a lock on data! – No problem! • All log records up to Transaction’s lastLSN are • Before restoring old value of a page, write a CLR: flushed – You continue logging while you UNDO!! – Guarantees that flushedLSN  lastLSN – CLR has one extra field: undonextLSN – Note that log flushes are sequential, synchronous • Points to the next LSN to undo (i.e. the prevLSN of the record we’re currently undoing) writes to disk – CLR contains REDO info – Many log records per log page – CLRs never Undone • Make transaction visible • Undo needn’t be idempotent (>1 UNDO won’t happen) • But they might be Redone when repeating history (=1 UNDO – Commit() returns, locks dropped, etc. guaranteed) • Write end record to log • At end of all UNDOs, write an “end” log record 9/11/2018 Cs262a-F18 Lecture-06 23 9/11/2018 Cs262a-F18 Lecture-06 24

7. Crash Recovery: Big Picture Recovery: The Analysis Phase Oldest log rec. of Xact  Start from a checkpoint (found • Reconstruct state at checkpoint active at crash via master record) – via end_checkpoint record Smallest  Three phases. Need to: • Scan log forward from begin_checkpoint recLSN in dirty page – Figure out which Xacts – End record: Remove Xact from Xact table table after committed since checkpoint, – Other records: Add Xact to Xact table, set Analysis which failed (Analysis) lastLSN=LSN, change Xact status on commit – REDO all actions – Update record: If P not in Dirty Page Table, Last chkpt  (repeat history) • Add P to D.P.T., set its recLSN=LSN – UNDO effects of failed Xacts. CRASH This phase could be skipped; A R U information can be regained in subsequent REDO pass 9/11/2018 Cs262a-F18 Lecture-06 25 9/11/2018 Cs262a-F18 Lecture-06 26 Recovery: The REDO Phase Invariant • We repeat History to reconstruct state at crash: – Reapply all updates (even of aborted Xacts!), redo • State of page P is the outcome of all changes of CLRs relevant log records whose LSN is <= P.pageLSN • Scan forward from log rec containing smallest recLSN • During redo phase, every page P has P.pageLSN in D.P.T. For each CLR or update log rec LSN, REDO the >= redoLSN action unless page is already more up-to-date than this record: • Thus at end of redo pass, the database has a – REDO when Affected page is in D.P.T., and has state that reflects exactly everything on the pageLSN (in DB) < LSN. [if page has recLSN > LSN no (stable) log need to read page in from disk to check pageLSN] • To REDO an action: – Reapply logged action 9/11/2018 – Set pageLSN to LSN. No additional logging! Cs262a-F18 Lecture-06 27 9/11/2018 Cs262a-F18 Lecture-06 28

8. Recovery: The UNDO Phase UndoNextLSN • Key idea: Similar to simple transaction abort, for each loser transaction (that was in flight or aborted at time of crash) – Process each loser transaction’s log records backwards; undoing each record in turn and generating CLRs • But: loser may include partial (or complete) rollback actions • Avoid to undo what was already undone – undoNextLSN field in each CLR equals prevLSN field from the original action From Mohan et al, TODS 17(1):94-162 9/11/2018 Cs262a-F18 Lecture-06 29 9/11/2018 Cs262a-F18 Lecture-06 30 Recovery: The UNDO Phase Restart Recovery Example ToUndo={ l | l a lastLSN of a “loser” Xact} Repeat: – Choose largest LSN among ToUndo. – If this LSN is a CLR and undonextLSN==NULL • Write an End record for this Transaction – If this LSN is a CLR, and undonextLSN != NULL • Add undonextLSN to ToUndo • (Q: what happens to other CLRs?) – Else this LSN is an update. Undo the update, write a CLR, add prevLSN to ToUndo Until ToUndo is empty From Mohan et al, TODS 17(1):94-162 9/11/2018 Cs262a-F18 Lecture-06 31 9/11/2018 Cs262a-F18 Lecture-06 32

9. Example of Recovery Example: Crash During Restart! LSN LOG LSN LOG 00,05 begin_checkpoint, end_checkpoint RAM 00 begin_checkpoint RAM 10 update: T1 writes P5 05 end_checkpoint 20 update T2 writes P3 undonextLSN Xact Table 10 update: T1 writes P5 prevLSNs Xact Table 30 T1 abort lastLSN 20 update T2 writes P3 lastLSN 40,45 CLR: Undo T1 LSN 10, T1 End status status 30 T1 abort 50 update: T3 writes P1 Dirty Page Table Dirty Page Table recLSN 40 CLR: Undo T1 LSN 10 recLSN 60 update: T2 writes P5 flushedLSN 45 T1 End flushedLSN CRASH, RESTART 50 update: T3 writes P1 70 CLR: Undo T2 LSN 60 ToUndo 60 update: T2 writes P5 ToUndo 80,85 CLR: Undo T3 LSN 50, T3 end CRASH, RESTART CRASH, RESTART 90 CLR: Undo T2 LSN 20, T2 end 9/11/2018 Cs262a-F18 Lecture-06 33 9/11/2018 Cs262a-F18 Lecture-06 34 Additional Crash Issues Parallelism during restart • What happens if system crashes during Analysis? During REDO? • Activities on a given page must be processed in • How do you limit the amount of work in REDO? sequence – Flush asynchronously in the background. • Activities on different pages can be done in parallel – Watch “hot spots”! • How do you limit the amount of work in UNDO? – Avoid long-running Xacts. 9/11/2018 Cs262a-F18 Lecture-06 35 9/11/2018 Cs262a-F18 Lecture-06 36

10. Log record contents Physical logging • Describe the bits (optimization: only those that • What is actually stored in a log record, to allow change) REDO and UNDO to occur? • Example – OLD STATE: 0x47A90E…. • Many choices, 3 main types – NEW STATE: 0x632F00… – PHYSICAL – So REDO: set to NEW; UNDO: set to OLD – LOGICAL • Or just delta (OLD XOR NEW) – PHYSIOLOGICAL – DELTA: 0x24860E… – So REDO=UNDO=xor with delta • Ponder: XOR is not idempotent, but redo and undo must be; why is this OK? 9/11/2018 Cs262a-F18 Lecture-06 37 9/11/2018 Cs262a-F18 Lecture-06 38 Logical Logging Physiological Logging • Describe the operation and arguments • Describe changes to a specified page, logically • E.g., Update field 3 of record whose key is 37, by within that page adding 32 • Goes with common page layout, with records • We need a programmer supplied inverse indexed from a page header operation to undo this • Allows movement within the page (important for records whose length varies over time) • E.g., on page 298, replace record at index 17 from old state to new state • E.g., on page 35, insert new record at index 20 9/11/2018 Cs262a-F18 Lecture-06 39 9/11/2018 Cs262a-F18 Lecture-06 40

11. ARIES logging Interactions • ARIES allows different log approaches; common • Recovery is designed with deep awareness of choice is: access methods (eg B-trees) and concurrency • Physiological REDO logging control – Independence of REDO (e.g. indexes & tables) • And vice versa • Can have concurrent commutative logical operations like • Need to handle failure during page split, increment/decrement (“escrow transactions”) reobtaining locks for prepared transactions • Logical UNDO during recovery, etc – To allow for simple management of physical structures that are invisible to users • CLR may act on different page than original action – To allow for escrow 9/11/2018 Cs262a-F18 Lecture-06 41 9/11/2018 Cs262a-F18 Lecture-06 42 Nested Top Actions Summary of Logging/Recovery • Trick to support physical operations you do not want to ever be undone • Recovery Manager guarantees Atomicity & – Example? Durability. • Basic idea • Use WAL to allow STEAL/NO-FORCE w/o sacrificing – At end of the nested actions, write a dummy CLR correctness. • Nothing to REDO in this CLR • LSNs identify log records; linked into backwards – Its UndoNextLSN points to the step before the nested chains per transaction (via prevLSN). action • pageLSN allows comparison of data page and log records. 9/11/2018 Cs262a-F18 Lecture-06 43 9/11/2018 Cs262a-F18 Lecture-06 44

12. Summary, Cont. Further Readings • Checkpointing: A quick way to limit the amount • Repeating History Beyond ARIES, of log to scan on recovery. – C. Mohan, Proc VLDB’99 • Recovery works in 3 phases: – Reflections on the work 10 years later – Analysis: Forward from checkpoint. • Model and Verification of a Data Manager Based – Redo: Forward from oldest recLSN. on ARIES – Undo: Backward from end to first LSN of oldest – D. Kuo, ACM TODS 21(4):427-479 Xact alive at crash. – Proof of a substantial subset • Upon Undo, write CLRs. • A Survey of B-Tree Logging and Recovery • Redo “repeats history”: Simplifies the logic! Techniques – G. Graefe, ACM TODS 37(1), article 1 9/11/2018 Cs262a-F18 Lecture-06 45 9/11/2018 Cs262a-F18 Lecture-06 46 Is this a good paper? • What were the authors’ goals? • What about the performance metrics? • Did they convince you that this was a good system? • Were there any red-flags? • What mistakes did they make? • Does the system meet the “Test of Time” challenge? • How would you review this paper today? 9/11/2018 Cs262a-F18 Lecture-06 47