申请试用
HOT
登录
注册
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark

Machine Learning at Scale with MLflow and Apache Spark

Spark开源社区
/
发布于
/
3781
人观看

Societe Generale is one of the major banks in France and has many data science teams across the globe. After years of explorations and prototyping, it is time for the company to really deploy machine learning projects at scale to the production environment.

To achieve that goal, we have been working hard to define a standard process of collaboration between data engineers and data scientists. And we also designed and deployed an infrastructure for productionizing machine learning.

During this presentation, you will be looking at the following points of our adventure:

  1. Difficulties that we had for putting ML applications into production, such as lack of model registry; hard to deploy ML libraries to our Hadoop cluster; collaboration between data scientists and data engineers etc. ?
  2. How did we deploy MLflow as a key technical component to our production hadoop environment given different security constraints.
  3. How did we build a CI/CD pipeline to deploy ML applications automatically. MLflow plays an important role in this piepline.
  4. A first and concrete production project developed on top of this infrastructure with MLflow, Spark streaming, Sklearn and CI/CD.

The key takeaways of this presentation would be:

  1. To productionize machine learning in a big structure like Société Générale, a process of collaboration should be clearly defined.
  2. A ML model registry is key to ML productionization. MLflow is the best solution we found.
  3. A CI/CD pipeline is essential to the success of a machine learning application.
6点赞
2收藏
0下载
确认
3秒后跳转登录页面
去登陆