Spatio-temporal Data Management based on HBase Ganos and its Spark Extension

阿里巴巴的技术专家 Fei Xiao。首先他介绍了时空数据的背景知识。
目前阿里内部主要有两条线来做时空数据库,一种是关系型数据库的模式,比如 PolarDB 或者PG,另一种是非关系型数据库。前者功能完备,但支持的数据量以及并发度不高,后者可扩展性较好,但功能没那么完善。接下来他介绍了基于 HBase 实现的时空数据库,并详细介绍了时空索引的原理。时空数据专业性较强,数据量较大,需要一些编码和解码以提高效率,同时也需要一些领域相关的知识。

展开查看详情

1.

2.Alibaba Cloud Intelligence TST

3.• •

4.- - • SimpleFeature Geometry • SimpleFeatureType • WKT Well-known text SimpleFeature

5. • - - • - • - - • )

6. 3 3 3 3 3

7. - - - - - -

8. 0A D D A B D •0 H- -, / Ø H- -, / Ø AD 1 Ø , 1- Ø 0EAA BD D B B D B A D E DB D BG D • 0 H Ø H Ø 0E D B- B Ø , - Ø 0EAA BD D B DB D BG D

9.• HBase Ganos is a new generation of Spatio-temporal Data Engine (SDE) based on GeoMesa and Ali-HBase storage platform. • Enable large-scale geospatial analytics on cloud and distributed computing systems • Support data analysis based on Apache Spark using HBase Ganos as the backend datastore. • Support hot and cold data separation

10.HBase Ganos HBase Ganos Spark

11.

12.GeoHash uses interval halving on latitude and longitude to build up a bit-string of alternating dimensions.

13.a: Define query region b: spatial partition c: calculate query range

14.• • • • • •

15.

16.Testing Environment: Huabei-2 • Master node 2CPU 4GB (hbase.n1.medium) • Core node 3 Node 4CPU 8GB (hbase.sn1.2xlarge) • Writing Thread Num. 10 • Batching Size:1000 • Reading Thread Num. 10

17.

18.

19.l 6 6A A6 • A3 6 • - • 3 A D 6 A A 53 3 6 6A • 4 BA6 5 06 6AD l - BA6 • % % • %% A 0 3 • 3 A6 3 A 3 A D 6 A B3 D • 3 A D 6 5 3 6 • - 3 A 53 3

20.• GeoMesa Spark allows for execution of jobs on Apache Spark using data stored in HBase Ganos. • The library allows creation of Spark RDDs and DataFrames, writing of Spark RDDs and DataFrames to HBase Ganos.

21.Global Index: • Grid • RTree • QuadTree • KDBTree Local Index: • QuadTree • Rtree

22.

23.SELECT * FROM aispoint WHERE st_contains(st_makeBBOX(114.00000,22.00000,115.00000,23.00000), geom) |AND dtg between cast('2018-09-08T01:00:00Z' as timestamp) AND cast('2018-09-13T01:00:00Z' as timestamp) SELECT ship_id,ST_PointToTrajectory(ship_id,dtg,geom) AS traj FROM point GROUP BY point.ship_id

24.Amount of Points: 1. Spatial Query 2. Id and time query 985,800,104 Time 53ms Time 145ms Result count 15 Result count 7

25.