- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Spatio-temporal Data Management based on HBase Ganos and its Spark Extension
阿里巴巴的技术专家 Fei Xiao。首先他介绍了时空数据的背景知识。
目前阿里内部主要有两条线来做时空数据库,一种是关系型数据库的模式,比如 PolarDB 或者PG,另一种是非关系型数据库。前者功能完备,但支持的数据量以及并发度不高,后者可扩展性较好,但功能没那么完善。接下来他介绍了基于 HBase 实现的时空数据库,并详细介绍了时空索引的原理。时空数据专业性较强,数据量较大,需要一些编码和解码以提高效率,同时也需要一些领域相关的知识。
展开查看详情
1 .
2 .Alibaba Cloud Intelligence TST
3 .• •
4 .- - • SimpleFeature Geometry • SimpleFeatureType • WKT Well-known text SimpleFeature
5 . • - - • - • - - • )
6 . 3 3 3 3 3
7 . - - - - - -
8 . 0A D D A B D •0 H- -, / Ø H- -, / Ø AD 1 Ø , 1- Ø 0EAA BD D B B D B A D E DB D BG D • 0 H Ø H Ø 0E D B- B Ø , - Ø 0EAA BD D B DB D BG D
9 .• HBase Ganos is a new generation of Spatio-temporal Data Engine (SDE) based on GeoMesa and Ali-HBase storage platform. • Enable large-scale geospatial analytics on cloud and distributed computing systems • Support data analysis based on Apache Spark using HBase Ganos as the backend datastore. • Support hot and cold data separation
10 .HBase Ganos HBase Ganos Spark
11 .
12 .GeoHash uses interval halving on latitude and longitude to build up a bit-string of alternating dimensions.
13 .a: Define query region b: spatial partition c: calculate query range
14 .• • • • • •
15 .
16 .Testing Environment: Huabei-2 • Master node 2CPU 4GB (hbase.n1.medium) • Core node 3 Node 4CPU 8GB (hbase.sn1.2xlarge) • Writing Thread Num. 10 • Batching Size:1000 • Reading Thread Num. 10
17 .
18 .
19 .l 6 6A A6 • A3 6 • - • 3 A D 6 A A 53 3 6 6A • 4 BA6 5 06 6AD l - BA6 • % % • %% A 0 3 • 3 A6 3 A 3 A D 6 A B3 D • 3 A D 6 5 3 6 • - 3 A 53 3
20 .• GeoMesa Spark allows for execution of jobs on Apache Spark using data stored in HBase Ganos. • The library allows creation of Spark RDDs and DataFrames, writing of Spark RDDs and DataFrames to HBase Ganos.
21 .Global Index: • Grid • RTree • QuadTree • KDBTree Local Index: • QuadTree • Rtree
22 .
23 .SELECT * FROM aispoint WHERE st_contains(st_makeBBOX(114.00000,22.00000,115.00000,23.00000), geom) |AND dtg between cast('2018-09-08T01:00:00Z' as timestamp) AND cast('2018-09-13T01:00:00Z' as timestamp) SELECT ship_id,ST_PointToTrajectory(ship_id,dtg,geom) AS traj FROM point GROUP BY point.ship_id
24 .Amount of Points: 1. Spatial Query 2. Id and time query 985,800,104 Time 53ms Time 145ms Result count 15 Result count 7
25 .