申请试用
HOT
登录
注册
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework

Zipline—Airbnb’s Declarative Feature Engineering Framework

Spark开源社区
/
发布于
/
4318
人观看

Zipline is Airbnb’s data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks. Zipline reduces this task from months to days – by making the process declarative. It allows data scientists to easily define features in a simple configuration language. The framework then provides access to point-in-time correct features – for both – offline model training and online inference. In this talk we will describe the architecture of our system and the algorithm that makes the problem of efficient point-in-time correct feature generation, tractable.

The attendee will learn

Importance of point-in-time correct features for achieving better ML model performance
Importance of using change data capture for generating feature views
An algorithm – to efficiently generate features over change data. We use interval trees to efficiently compress time series features. The algorithm allows generating feature aggregates over this compressed representation.
A lambda architecture – that enables using the above algorithm – for online feature generation.
A framework, based on category theory, to understand how feature aggregations be distributed, and independently composed.
While the talk if fairly technical – we will introduce all the concepts from first principles with examples. Basic understanding of data-parallel distributed computation and machine learning might help, but are not required.

6点赞
2收藏
3下载
确认
3秒后跳转登录页面
去登陆