AliHB Real-time cold data backup


1.AliHB Real-Time Cold data Backup 孟庆义(mengqingyi)

2.⽬目录 Content 01 HBase Backup State Alibaba’s requirements 02 on Backup AliHB Real-Time Cold 03 data Bakcup 04 Future works

3. HBase Backup State Against Against User Hardware Application RPO RTO failure error Snapshot NO YES N/A N/A Replication YES NO seconds seconds Increase with HBase Backup Restore YES YES minutes data size AliHB Real-Time cold data backup YES YES seconds minutes

4.Alibaba’s requirements for Backup •  RPO < 1minutes •  Predictable RTO for PB scale data •  Low Cost •  NO affect on Online service •  Easy Management

5.AliHB Real-Time Cold data Backup •  Real-Time incremental backup •  Independent with HBase -  No need for snapshot •  Stateless worker node •  Backup in heterogeneous Storage maintained by another team

6.Backup Overview Backup Cluster Source Cluster Target Cluster(pangu) Full backup HFile HFile HFile HFile HFile HFile Region Copy Increment backup Log Log Log Log Log Log Log Tracker Log Copy

7.Full Backup •  Job copy for a table •  Task copy for a region •  Challenge: region’s file list keep changing -  Compaction remove old files -  Split remove the entire region -  Merge remove the entire region

8.Compaction •  At first we have file 1,2,3,4,5 •  When copy 4, found it missing •  Refresh list we have 1,2,6 •  Copy 6 Copy File 1 2 3 4 5 Compaction 1 2 6

9.Split •  We are the parent region -  Found region missing, reload meta and resubmit tasks •  We are the child region -  Copy the reference file and it’s original file -  If referenced file missing, refresh the file list and continue •  Merge works like split

10.Algorithm start No Yes All files Select next File copied Copy file file exist? ? Yes No Yes Refresh Region file list exist? No Reload meta end and submit new task

11.Incremental Backup Source Cluster Backup Cluster Register new log <logName, state, offset> Log Zookeeper Tracker HBase Scan logs Copy log HDFS Worker Worker Worker Latency < 10 seconds

12.Log Lifecycle •  Writing -  Log Tracker period scan and find new logs •  Closed -  If not the latest log of the region server or in the “.oldlogs” •  Finished -  If worker has copied the whole closed Log •  Deleted -  If Log Tracker can not find it in HBase and it’s finished on backup, then delete the log record on backup system

13.Data Consistence •  Full comparison -  Do sample comparison -  Sample on every region -  Balanced sample, use index of the largest file for each region •  Incremental comparison -  Compare recent logs

14.Restore Scenes •  Cluster Level -  Restore the whole cluster •  Table Level -  Restore one or list of tables •  Region Level -  Restore ranged data of some table •  Restore to given time point

15.Restore Tools •  Bulkload the full backup -  Filter hfiles by table name and range •  Use LogRestore tool to restore logs -  Filter by table name -  Filter by range -  Filter by timestamp

16.Restore Runtime •  HFiles -  Split by region, one region one task •  Logs Restore Manager -  Each log is a task Submit tasks Bulkload Worker Worker Worker LogRestore

17.Real-Time Cold data Backup Master Log Backup Tracker Manager Data Restore Cleaner Manager Worker Worker Worker Copy Copy Log Region Log Bulkload Restore


19.Performance Backup System 200Nodes 110TB data backup 22minutes Restore 53minutes HBase 377Nodes

20.Conclusion •  AliHB Real-time Cold data backup -  Realtime incremental backup keep the latency in seconds -  Scale out ability to obtain more power on restore -  Use less resources on normal backup -  Independent with HBase, easy to deploy and upgrade

21.Future works •  Incremental Restore -  Recognize Hot / Cold Data -  Resume the hbase service after Restore hot data -  Access the cold data through reference file -  Background restore cold data •  Put log lifecycle manage on HBase -  Period scan on .oldlogs cause pressure on NN -  Keep only the necessary logs on zookeeper •  Compact hlogs to Hfile -  Save storage space -  Speed up restore

22.谢谢观看 Thanks