- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Database Streaming with Apache Pulsar
展开查看详情
1 .Database Streaming with Apache Pulsar Sijie Guo — twitter: @sijieg wechat: guosijie_
2 .Who am I • Sijie Guo • Pulsar, BookKeeper, Hive, HBase, … • Streamlio Cofounder • Yahoo -> Twitter -> Streamlio • 华中科⼤ -> 中科院计算所
3 .Agenda 1. The Beauty of Change Data Capture 2. Streaming MySQL - Connectors/Functions, Schema 3. Interactive Queries - Presto SQL
4 .Agenda 1. The Beauty of Change Data Capture 2. Streaming MySQL - Connectors/Functions, Schema 3. Interactive Queries - Presto SQL
5 .Traditional Batch ETL Pipeline
6 .Traditional Batch ETL Pipeline High Latency Huge number of Jobs Error prone Manual Schema Updates
7 .Current Mess
8 .Pulsar as the backbone
9 .Getting data to Pulsar
10 .Option 1 - Dural Writes
11 .Option 1 - Dural Writes Data consistency??? :(
12 .Option 2 - Event Sourcing
13 .Option 2 - Event Sourcing Violates read-your-writes consistency :(
14 .Option 3 - Change Data Capture
15 . “In databases, a design pattern that captures individual database changes to a stream of changes”
16 .Option 3 - Change Data Capture Yeah ! :)
17 .Agenda 1. The Beauty of Change Data Capture 2. Streaming MySQL - Connectors/Functions, Schema 3. Interactive Queries - Presto SQL
18 .Pulsar IO Framework
19 .Mysql —> HDFS
20 .Mysql —> Pulsar
21 .Function Overview Light weight Compute Framework Language Native (Java, Python) Flexible deployment (Thread, Process, Container, K8S) Pulsar Functions Managed State ETL, Filtering, Routing, …
22 .Function in Action
23 .Function Example public class WordCountFunction implements Function<String, Void> { @Override public Void process(String input, Context context) { Arrays.asList(input.split(“\\s+”)) .forEach(word -> context.incrCounter(word, 1)); return null; } }
24 .Function Submission $ bin/pulsar-admin functions create \ --jar target/my-functions.jar \ --classname org.example.functions.MyFunction \ Submission --cpu 8 \ --ram 8589934592 \ --disk 10737418240 \ —-inputs input_topic —-output output_topic $ bin/pulsar-admin functions local run \ Local Run --jar target/my-functions.jar \ --classname org.example.functions.MyFunction \ —-inputs input_topic —-output output_topic
25 .Function Deployment Function Function Function wordcount-1 wordcount-1 transform-2 Function Function Pod 1 wordcount-1 transform-2 Worker Function dataroute-1 Worker Node 1 Pod 2 Broker 1 Broker 1 Broker 1 Node 1 Node 2 Pod 3 Thread Process K8S
26 .IO Connectors A light weight connector framework built on functions runtime
27 .Running Connectors $ ./bin/pulsar-admin source create \ --tenant test \ --namespace ns1 \ Source --name twitter-source \ --destinationTopicName twitter_data \ --sourceConfigFile examples/twitter.yml \ --source-type twitter $ ./bin/pulsar-admin sink create \ Sink --tenant public \ --namespace default \ --name cassandra-test-sink \ --sink-type cassandra \ --sinkConfigFile examples/cassandra-sink.yml \ --inputs test_cassandra
28 .Connector Config configs: roots: "localhost:9042" keyspace: "pulsar_test_keyspace" columnFamily: "pulsar_test_table" keyname: "key" columnName: "col"
29 .Debezium Overview Open source CDC Records row-level changes via WAL Supports MYSQL, MongoDB, PostgreSQL, Oracle, Debezium SQL Server