08_AstroServer: A Real-time Analysis System for GWAC

AstroServer: A Real-time Analysis System for GWAC • GWAC(the ground-based wide-angle camera arra • Covering large field & high sampling frequen
展开查看详情

1.AstroServer: A Real-time Analysis System for GWAC Wei Ren RUC 10/11/2017

2.Scientific Big Data System -Accelerating scientific discovery • Background • The Scientific Big Data System is funded by the 'National Key R&D Plan: Cloud Computing and Big Data'. Led by Chinese Academy of Sciences and joint 16 universities and institutions. • Goals: Astronomy: efficiency storage&analysis of 100billion lines astronomical catalogs High-energy physics: high-efficiency storage and retrieval of trillion-event data Bioscience: retrieval of multi-level correlation of 10-billionedge RDF knowledge graphs

3.AstroServer: Big Astronomy Data Analytics • GWAC(the ground-based wide-angle camera array) • Covering large field & high sampling frequency Sky Survey Field 5000 (square degree) Sampling Frequence 15s observation stars 1.58million generated data 2.5TB/day Service life 10 years Total data 8PB

4.Real-time Analysis Online Data Filter Analysis Organization

5.Data Modeling t1 t2 t3 ...... tn Camera Camera Array id 1 CCD1 id 2 Key1 Value Data format CCD2 id 3 Key2 Value CCD3 ...... Key3 Value CCD4 Key4 Value id n

6.Data Modeling t1 t2 t3 ...... tn Camera Camera Array id 1 CCD1 id 2 Key1 Value Data format CCD2 id 3 Key2 Value CCD3 ...... Key3 Value CCD4 Key4 Value id n

7.Filter • Compression - high consume • Filter-1 Ø filtered tuple • Filter-2 ØStorage transient source(original data filtered) t1 t2 t3 ...... tn id x y t d1 d2 d3 ... d m id i1d n c≦m id x y t d1 ... dc

8.Filter • Compression - high consume • Filter-1 Ø filtered tuple • Filter-2 ØStorage transient source(original data filtered) t1 t2 t3 ...... tn id x y t d1 d2 d3 ... d m id i1d n c≦m id x y t d1 ... dc

9.Data Organization • Question: How to find all transient source in a period time? • SEPI index(Single Endpint Index) Ø inverted index < oid|stime, etime > Ø high update&distributed capacity t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

10.Data Organization • Question: How to find all transient source in a period time? • SEPI index(Single Endpint Index) Ø inverted index < oid|stime, etime > Ø high update&distributed capacity t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

11.Online Analysis • Question: How many transient source exist in a period time? 2 Events +1 +2 +0 +1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 Count: 2+1+2+0+1=6

12.Experiment • Storage time: Ø0.81s(working duration time: 4.16h) • Analysis time: ØInterval query=2.5s ØCount analysis=0.112s sec running times

13.Next Steps GWAC Now Sampling Frequence 15s observation stars 1.6millions • Future requirement generated data 2.5TB/day Øtime scale< 1s Service life 10 years Total data 8PB • Future Work Øusing new hardware to accelerate storage processing Øquery rewriting Øadaptive compression with high performance ØGPU processing

14.Next Steps GWAC Now Future Sampling Frequence 15s 1s observation stars 1.6millions 24millions • Future requirement generated data 2.5TB/day 37.5TB/day Øtime scale< 1s Service life 10 years 10 years Total data 8PB 120PB • Future Work Øusing new hardware to accelerate storage processing Øquery rewriting Øadaptive compression with high performance ØGPU processing

15.Thank You! weiren@ruc.edu.cn http://idke.ruc.edu.cn