申请试用
HOT
登录
注册
 
Tactical Data Science Tips - Python and Spark Together

Tactical Data Science Tips - Python and Spark Together

Spark开源社区
/
发布于
/
3429
人观看

Running Spark and Python data science workloads can be challenging given the complexity of the various data science tools in the ecosystem like sci-kit Learn, TensorFlow, Spark, Pandas, and MLlib. All these various tools and architectures, provide important trade-offs to consider when it comes to moving to proofs of concept and going to production. While proof of concepts may be relatively straightforward, moving to production can be challenging because it’s difficult to understand not just the short term effort to develop a solution, but the long term cost of supporting projects over the long term.

This talk will discuss important tactical patterns for evaluating projects, running proofs of concept to inform going to production, and finally the key tactics we use internally at Databricks to take data and machine learning projects into production. This session will cover some architectural choices involving Spark, PySpark, Pandas, notebooks, various machine learning toolkits, as well as frameworks and technologies necessary to support them.

Key Takeaways will include:

  1. How best to organize projects given a variety of tools,
  2. how to better understand the tradeoff of single node and distributed training of machine learning models, and
  3. how we organize and execute on data science projects internally at Databricks.
6 点赞
2 收藏
0下载
确认
3秒后跳转登录页面
去登陆