1.HBase on Beam
2.Apache Beam u Apache Beam is an open source, unified programming model for defining both batch and streaming data-parallel processing pipelines. u It was initialized and contributed by Google. u Published the first stable release on May 17, 2017.
3.Apache Beam https://beam.apache.org/images/beam_architecture.png
4.Apache Beam u A unified model for batch and streaming applications. u Runners for famous open-source batch and streaming engines, for instance Spark and Flink. u Multi-languages are available for end users to build their own pipelines, now Java and Python are supported. u Implement once, run almost everywhere.
5.Apache Beam u Pipeline: The processing pipeline which includes data input, transform and output. u PCollection: The representation for both bounded and unbounded data u Transform u ParDo u GroupByKey u Combine u Flatten u …
6.Data Sources u In-memory data: Array, Collection, Map u Text u HDFS u Kafka u HBase u …
7.Windowing u Fixed time windows u Sliding time windows u Session windows u Single global window
8.Serialization u Every Transform must be serializable! u CustomCoder u Register coder for classes u Register coder for the output of transform u Serializable
9.Example: Count the Words https://beam.apache.org/images/wordcount-pipeline.png
10.Examples: Count the Words
11.Capability Matrix https://beam.apache.org/documentation/runners/capability-matrix/
12.HBase + Beam u Inspired by HBase + Spark u Similar functions, Beam SQL is not supported yet. u Use HBase as a bounded data source, and a target data store in both batch and streaming applications u Customized Transforms for HBase bulk operations, and HBasePipelineFunctions as the entry to start the pipeline.
13.Operations u Operations for both batch and streaming manners u Scan (Already implemented in Beam) u BulkGet u BulkPut u BulkDelete u MapPartitions u ForeachPartition u BulkLoad u BulkLoadThinRows
14.Examples: Scan u Read data from HBase table by scan
15.Examples: BulkGet u Implement MakeFunctions to convert input to Get, and convert Result to output
16.Examples: BulkPut u Implement MakeFunction to convert input to Put.
17.Examples: BulkDelete u Implement MakeFunction to convert input to Delete.
21.Examples: BulkLoad u Implement MakeFunction to convert each input into a Cell.
23.Example: BulkLoadThinRows u Implement MakeFunctions to convert each input into row keys and cells.
25.Future u Contribute the code to Apache Beam u Support Beam SQL in HBase