申请试用
HOT
登录
注册
 
Stream, Stream, Stream - Different Streaming Methods with Apache Spark & Kafka
Stream, Stream, Stream - Different Streaming Methods with Apache Spark & Kafka

Stream, Stream, Stream - Different Streaming Methods with Apache Spark & Kafka

Spark开源社区
/
发布于
/
3439
人观看

At NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. To achieve that, we need to ingest billions of events per day into our big data stores, and we need to do it in a scalable yet cost-efficient manner.

In this session, we will discuss how we continuously transform our data infrastructure to support these goals. Specifically, we will review how we went from CSV files and standalone Java applications all the way to multiple Kafka and Spark clusters, performing a mixture of Streaming and Batch ETLs, and supporting 10x data growth We will share our experience as early-adopters of Spark Streaming and Spark Structured Streaming, and how we overcame technical barriers (and there were plenty). We will present a rather unique solution of using Kafka to imitate streaming over our Data Lake, while significantly reducing our cloud services’ costs. Topics include:

Kafka and Spark Streaming for stateless and stateful use-cases
Spark Structured Streaming as a possible alternative
Combining Spark Streaming with batch ETLs
”Streaming” over Data Lake using Kafka

6 点赞
2 收藏
0下载
确认
3秒后跳转登录页面
去登陆