申请试用
HOT
登录
注册
 

Scaling Apache Spark on Kubernetes at Lyft

Spark开源社区
/
发布于
/
7682
人观看
Lyft is on the mission to improve people’s lives with the world’s best transportation. As part of this mission Lyft invests heavily in open source infrastructure and tooling. At Lyft Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark at Lyft has evolved to solve both Machine Learning and large scale ETL workloads. By combining the flexibility of Kubernetes with the data processing power of Apache Spark, Lyft is able to drive ETL data processing to a different level. In this talk, Li Gao and Rohit Menon will talk about challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Topics Include: – Key traits of Apache Spark on Kubernetes. – Deep dive into Lyft’s multi-cluster setup and operationality to handle petabytes of production data. – How Lyft extends and enhances Apache Spark to support capabilities such as Spark pod life cycle metrics and state management, resource prioritization, and queuing and throttling. – Dynamic job scale estimation and runtime dynamic job configuration. – How Lyft powers internal Data Scientists, Business Analysts, and Data Engineers via a multi-cluster setup.
1点赞
0收藏
3下载
确认
3秒后跳转登录页面
去登陆