- 微博 QQ QQ空间 贴吧
1 .HBase on Beam
2 .Apache Beam u Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines. u It was initialized and contributed by Google. u Published the first stable release on May 17, 2017.
3 .Apache Beam https://beam.apache.org/images/beam_architecture.png
4 .Apache Beam u A unified model for batch and streaming applications. u Runners for famous open-source batch and streaming engines, for instance Spark and Flink. u Multi-languages are available for end users to build their own pipelines, now Java and Python are supported. u Implement once, run almost everywhere.
5 .Apache Beam u Pipeline: The processing pipeline which includes data input, transform and output. u PCollection: The representation for both bounded and unbounded data u Transform u ParDo u GroupByKey u Combine u Flatten u …
6 .Data Sources u In-memory data: Array, Collection, Map u Text u HDFS u Kafka u HBase u …
7 .Windowing u Fixed time windows u Sliding time windows u Session windows u Single global window
8 .Serialization u Every Transform must be serializable! u CustomCoder u Register coder for classes u Register coder for the output of transform u Serializable
9 .Example: Count the Words https://beam.apache.org/images/wordcount-pipeline.png
10 .Examples: Count the Words
11 .Capability Matrix https://beam.apache.org/documentation/runners/capability-matrix/
12 .HBase + Beam u Inspired by HBase + Spark u Similar functions, Beam SQL is not supported yet. u Use HBase as a bounded data source, and a target data store in both batch and streaming applications u Customized Transforms for HBase bulk operations, and HBasePipelineFunctions as the entry to start the pipeline.
13 .Operations u Operations for both batch and streaming manners u Scan (Already implemented in Beam) u BulkGet u BulkPut u BulkDelete u MapPartitions u ForeachPartition u BulkLoad u BulkLoadThinRows
14 .Examples: Scan u Read data from HBase table by scan
15 .Examples: BulkGet u Implement MakeFunctions to convert input to Get, and convert Result to output
16 .Examples: BulkPut u Implement MakeFunction to convert input to Put.
17 .Examples: BulkDelete u Implement MakeFunction to convert input to Delete.
18 .Examples: MapPartitions
19 .Examples: MapPartitions
20 .Examples: ForeachPartition
21 .Examples: BulkLoad u Implement MakeFunction to convert each input into a Cell.
22 .Examples: BulkLoad
23 .Example: BulkLoadThinRows u Implement MakeFunctions to convert each input into row keys and cells.
24 .Example: BulkLoadThinRows
25 .Future u Contribute the code to Apache Beam u Support Beam SQL in HBase
26 .Thank You!