- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
Accelerating Astronomical Discoveries with Apache Spark
Our research group is investigating how to leverage Apache Spark (batch, streaming & real-time) to analyse current and future data sets in astronomy. Among the future large experiments, the Large Synoptic Survey Telescope (LSST) will start soon collecting terabytes of data per observation night, and the efficient processing and analysis of both real-time and historical data remains a major challenge. In this talk we will expose the main challenges and explore the latest developments tailored for big data problems in astronomy.
On the one hand we designed a new Data Source API extension to natively manipulate telescope images and astronomical tables within Apache Spark. We then extended the functionalities of the Apache Spark SQL module to ease the manipulation of 3D data sets and perform efficient queries: partitioning, data sets join and cross-match, nearest neighbors search, spatial queries, and more.
On the other hand we are using the new possibilities offered by Structured Streaming APIs in recent Apache Spark versions to enable real-time decisions by rapidly accessing and analysing the alerts sent by telescopes every