High-cadence Astronomy: Challenges & Applications for Big Databa

High-cadence Astronomy:Challenges & Applications for Big Databases
展开查看详情

1. High-cadence Astronomy: Challenges & Applications for Big Databases Bart Scheers Centrum Wiskunde & Informatics (CWI), Amsterdam XLDB 2017 Clermont-Ferrand 10–12 October 2017 Steven Bloemen (RU), Paul Groot (RU), Arjen van Elteren (Leiden), Pim Schellart (RU, Princeton), Martin Kersten (CWI), Hannes M¨ uhleisen (CWI) Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

2.High-cadence Astronomy ◮ . . . when a telescope automatically produces large field of view (FoV) images at a high rate (seconds–minutes) Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

3.High-cadence Astronomy ◮ . . . when a telescope automatically produces large field of view (FoV) images at a high rate (seconds–minutes) ◮ Currently ⊲ Radio: all-sky image ( 103 sources) every second–minute Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

4.High-cadence Astronomy – LOFAR Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

5.High-cadence Astronomy ◮ . . . when a telescope automatically produces large field of view (FoV) images at a high rate (seconds–minutes) ◮ Currently ⊲ Radio: all-sky image ( 103 sources) every second–minute ⊲ Optical: ”large” image (FoV several deg2 , 106 sources) every minute–five minutes Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

6.High-cadence Astronomy – MeerLICHT Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

7.High-cadence Astronomy ◮ . . . when a telescope automatically produces large field of view (FoV) images at a high rate (seconds–minutes) ◮ Currently ⊲ Radio: all-sky image ( 103 sources) every second–minute ⊲ Optical: ”large” image (FoV several deg2 , 106 sources) every minute–five minutes ◮ Combining Radio and Optical ⊲ Comparable resolution ⊲ (True) multi-wavelength astronomy Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

8.High-cadence Astronomy – MeerKAT Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

9.High-cadence Astronomy – MeerKAT + MeerLICHT Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

10.Direct detections of Gravitational Waves LIGO (Hanford & Livingston) Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

11.Direct detections of Gravitational Waves Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

12.Direct detections of Gravitational Waves Virgo Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

13.Direct detections of Gravitational Waves From Abbott et al. (2017) Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

14.Predicted EM signals (NS+NS → BH) ◮ First < 1 s: gamma/X-ray; beamed ◮ Up to hours/days: optical and IR; kilonova due to decay of r -process elements in neutrino-driven wind + jet-ISM shock ◮ After weeks to months: radio; ejecta-ISM shock ◮ Optical and IR are ideal: isotropically emitted, immediately visible Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

15.Hunt for Optical Counterparts Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

16.High-cadence Astronomy – New instruments → More data ◮ Radio: LOFAR → MeerKAT, ASKAP → SKA ⊲ raw 1 TB/s, 100 PFLOPS ⊲ source counts: 106 /deg2 ; cataloged sources: 1010 = 10s EB ◮ Optical: MeerLICHT → BlackGEM → LSST ⊲ raw 15 TB/night, 1 PFLOPS ⊲ meas.: 40 × 1012 sources; catalog: 40 × 109 objects = 100s PB Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

17.High-cadence Astronomy – New instruments → More data ◮ Radio: LOFAR → MeerKAT, ASKAP → SKA ⊲ raw 1 TB/s, 100 PFLOPS ⊲ source counts: 106 /deg2 ; cataloged sources: 1010 = 10s EB ◮ Optical: MeerLICHT → BlackGEM → LSST ⊲ raw 15 TB/night, 1 PFLOPS ⊲ meas.: 40 × 1012 sources; catalog: 40 × 109 objects = 100s PB ◮ Common challenges and overlapping strategies ⊲ Automated pipeline between telescope and database ⊲ Inspect data streams for transient and variable events ⊲ Data(base) archive is whealthy laboratory for doing complementary science ◮ Integrate DB technologies to reach science goals ⊲ Full-source database for complementary science ⊲ Move algorithms and statistics inside database engine Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

18.Big Data → Big Databases ◮ Database Architecture: Column store ⊲ No expensive parsing, no extra I/O, hard-coded operators, fast data localisation ◮ Partition ⊲ Partition according to declination and time (scale up) Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

19.Partition according to declination From lsst.org Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

20.Big Data → Big Databases ◮ Database Architecture: Column store ⊲ No expensive parsing, no extra I/O, hard-coded operators, fast data localisation ◮ Partition ⊲ Partition according to declination and time (scale up) ◮ Distribute ⊲ Spread DB data over multiple nodes (declination), scale out. ◮ Database system: MonetDB ⊲ Open source ⊲ RDBMS, SQL, many APIs (Python, C, R, Java, etc.) ⊲ Standalone & Master–Worker configurations ⊲ Integrated statistical querying using SQL ↔ Python, R Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

21.Integrated statistical querying – embedded Python CREATE F U N C T I O N c h i s q _ p r o b( chisq DOUBLE , dof INT ) R E T U R N S DOUBLE L A N G U A G E PYTHON { from scipy import stats return 1 - stats . chi2 . cdf ( chisq , dof ) }; Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

22.Integrated statistical querying – embedded Python SELECT t .* FROM ( SELECT a . runcat ,a . xtrsrc ,x . image ,i . band ,a . type , eta_int ,f_datapoints , chisq_pro b ( eta_int * f_datapoints , f _ d a t a p o i n t s ) AS chisqp FROM a s s o c x t r s o u r c e a ,extractedsource x , image i WHERE a . xtrsrc = x . id AND x . image = i . id ) t WHERE chisqp < 0.05 AND f _ d a t a p o i n t s > 20 ORDER BY runcat , xtrsrc Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

23.Big Data → Big Databases ◮ Database Architecture: Column store ⊲ No expensive parsing, no extra I/O, hard-coded operators, fast data localisation ◮ Partition ⊲ Partition according to declination and time (scale up) ◮ Distribute ⊲ Spread DB data over multiple nodes (declination), scale out. ◮ Database system: MonetDB ⊲ Open source ⊲ RDBMS, SQL, many APIs (Python, C, R, Java, etc.) ⊲ Standalone & Master–Worker configurations ⊲ Integrated statistical querying using SQL ↔ Python, R ◮ Monitor all queries for sublinear pipeline performance as database size grows Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

24.High-cadence Astronomy – Pipeline performance (optical data) – All queries summed Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

25.High-cadence Astronomy – Pipeline performance (optical data) – Attaching & loading 10 M5, stones16 8 Query run time [s] 6 4 2 0 0 100 200 300 400 500 600 700 Number of sources in database [×106] Q9 Q10 Q11 Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

26.High-cadence Astronomy – Pipeline performance (optical data) – Sublinear queries Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

27.High-cadence Astronomy – Pipeline performance (optical data) – Cross-matching query Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

28.High-cadence Astronomy – Pipeline performance (optical data) – Linear queries Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs

29.Cracking the LSST Database 104 M SQL 103 MonetDB single MonetDB distrib 102 101 Time [s] 100 10−1 10−2 10−3 Q01 Q02 Q03 Q04 Q05 Q06 Q07 Q08 Q09 Q10 Q11 Q12 Q13 MySQL: 33 TB over 25 nodes1 MonetDB single: 1.3 TB on single node2 MonetDB distrib: 1.3 TB over 10 nodes3 1 16 GB RAM; 8 × 1 TB HDD; dual-socket 4-core 1.8 GHz 2 256 GB RAM; 3 × 3 TB; dual-socket 16-core 2.6 GHz 3 16 GB RAM; 1 × 6 TB; single-socket 8-core 3.4 GHz Bart Scheers | XLDB 2017 High-cadence Astronomy: Challenges & Appl. for Big DBs