- 微博 QQ QQ空间 贴吧
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Shuffle in Apache Spark is an intermediate phrase redistributing data across computing units, which has one important primitive that the shuffle data is persisted on local disks. This architecture suffers from some scalability and reliability issues. Moreover, the assumptions of collocated storage do not always hold in today’s data centers. The hardware trend is moving to disaggregated storage and compute architecture for better cost efficiency and scalability.
To address the issues of Spark shuffle and support disaggregated storage and compute architecture, we implemented a new remote Spark shuffle manager. This new architecture writes shuffle data to a remote cluster with different Hadoop-compatible filesystem backends.
Firstly, the failure of compute nodes will no longer cause shuffle data recomputation. Spark executors can also be allocated and recycled dynamically which results in better resource utilization.
Secondly, for most customers currently running Spark with collocated storage, it is usually challenging for them to upgrade the disks on every node to latest hardware like NVMe SSD and persistent memory because of cost consideration and system compatibility. With this new shuffle manager, they are free to build a separated cluster storing and serving the shuffle data, leveraging the latest hardware to improve the performance and reliability.
Thirdly, in HPC world, more customers are trying Spark as their high performance data analytics tools, while storage and compute in HPC clusters are typically disaggregated. This work will make their life easier.
In this talk, we will present an overview of the issues of the current Spark shuffle implementation, the design of new remote shuffle manager, and a performance study of the work.