申请试用
HOT
登录
注册
 

Distributed Models Over Distributed Data with MLflow

Spark开源社区
/
发布于
/
3669
人观看

Does more data always improve ML models? Is it better to use distributed ML instead of single node ML?

In this talk I will show that while more data often improves DL models in high variance problem spaces (with semi or unstructured data) such as NLP, image, video more data does not significantly improve high bias problem spaces where traditional ML is more appropriate. Additionally, even in the deep learning domain, single node models can still outperform distributed models via transfer learning.

Data scientists have pain points running many models in parallel automating the experimental set up. Getting others (especially analysts) within an organization to use their models Databricks solves these problems using pandas udfs, ml runtime and MLflow.

6点赞
2收藏
1下载
确认
3秒后跳转登录页面
去登陆