申请试用
HOT
登录
注册
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at Facebook

Scaling Apache Spark at Facebook

Spark开源社区
/
发布于
/
8713
人观看
Spark started at Facebook as an experiment when the project was still in its early phases. Spark’s appeal stemmed from its ease of use and an integrated environment to run SQL, MLlib, and custom applications. At that time the system was used by a handful of people to process small amounts of data. However, we’ve come a long way since then. Currently, Spark is one of the primary SQL engines at Facebook in addition to being the primary system for writing custom batch applications. This talk will cover the story of how we optimized, tuned and scaled Apache Spark at Facebook to run on 10s of thousands of machines, processing 100s of petabytes of data, and used by 1000s of data scientists, engineers and product analysts every day. In this talk, we’ll focus on three areas: * *Scaling Compute*: How Facebook runs Spark efficiently and reliably on tens of thousands of heterogenous machines in disaggregated (shared-storage) clusters. * *Optimizing Core Engine*: How we continuously tune, optimize and add features to the core engine in order to maximize the useful work done per second. * *Scaling Users:* How we make Spark easy to use, and faster to debug to seamlessly onboard new users.
0点赞
1收藏
3下载
确认
3秒后跳转登录页面
去登陆