HBase_on_Beam

HBase_on_Beam
展开查看详情

1.HBase on Beam

2.Apache Beam u Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines. u It was initialized and contributed by Google. u Published the first stable release on May 17, 2017.

3.Apache Beam https://beam.apache.org/images/beam_architecture.png

4.Apache Beam u A unified model for batch and streaming applications. u Runners for famous open-source batch and streaming engines, for instance Spark and Flink. u Multi-languages are available for end users to build their own pipelines, now Java and Python are supported. u Implement once, run almost everywhere.

5.Apache Beam u Pipeline: The processing pipeline which includes data input, transform and output. u PCollection: The representation for both bounded and unbounded data u Transform u ParDo u GroupByKey u Combine u Flatten u …

6.Data Sources u In-memory data: Array, Collection, Map u Text u HDFS u Kafka u HBase u …

7.Windowing u Fixed time windows u Sliding time windows u Session windows u Single global window

8.Serialization u Every Transform must be serializable! u CustomCoder u Register coder for classes u Register coder for the output of transform u Serializable

9.Example: Count the Words https://beam.apache.org/images/wordcount-pipeline.png

10.Examples: Count the Words

11.Capability Matrix https://beam.apache.org/documentation/runners/capability-matrix/

12.HBase + Beam u Inspired by HBase + Spark u Similar functions, Beam SQL is not supported yet. u Use HBase as a bounded data source, and a target data store in both batch and streaming applications u Customized Transforms for HBase bulk operations, and HBasePipelineFunctions as the entry to start the pipeline.

13.Operations u Operations for both batch and streaming manners u Scan (Already implemented in Beam) u BulkGet u BulkPut u BulkDelete u MapPartitions u ForeachPartition u BulkLoad u BulkLoadThinRows

14.Examples: Scan u Read data from HBase table by scan

15.Examples: BulkGet u Implement MakeFunctions to convert input to Get, and convert Result to output

16.Examples: BulkPut u Implement MakeFunction to convert input to Put.

17.Examples: BulkDelete u Implement MakeFunction to convert input to Delete.

18.Examples: MapPartitions

19.Examples: MapPartitions

20.Examples: ForeachPartition

21.Examples: BulkLoad u Implement MakeFunction to convert each input into a Cell.

22.Examples: BulkLoad

23.Example: BulkLoadThinRows u Implement MakeFunctions to convert each input into row keys and cells.

24.Example: BulkLoadThinRows

25.Future u Contribute the code to Apache Beam u Support Beam SQL in HBase

26.Thank You!