16/07 - SASI, Cassandra on the full text search ride by submmit1

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/Cassandra/1607SASICassandraonthefulltextsearchridebysubmmit1657845?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

中国Cassandra技术社区

发布于

6年前

6503

人观看

#信息技术

1 SASI introduction 2 SASI cluster-wide 3 SASI local read/write path 4 Query planner 5 Some benchmarks 6 Take away

展开查看详情

1 .SASI, Cassandra on the full text search ride DuyHai DOAN – Apache Cassandra evangelist

3 .SASI introduction

6 .Why is it better than native 2nd index ? •  follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized •  new data-strutures •  range query (<, ≤, >, ≥) possible •  full text search options © DataStax, All Rights Reserved. 6

8 .SASI cluster-wide

9 . Distributed index On cluster level, SASI works exactly like native 2nd index B C UK user87 user176 … user987 A D UK user1 user102 … user493 US user54 user483 … user938 UK user17 user409 … user787 H E G F © DataStax, All Rights Reserved. 9

22 .Caveat 2 solution: materialized views For 1-to-1 index/relationship, use materialized views instead CREATE MATERIALIZED VIEW user_by_email AS SELECT * FROM users WHERE user_id IS NOT NULL and user_email IS NOT NULL PRIMARY KEY (user_email, user_id) But range queries ( <, >, ≤, ≥) not possible … © DataStax, All Rights Reserved. 22

25 .SASI local read/write path

26 .Local write path Index files are built •  on memtable flush •  on compaction flush To avoid OOM, index files are split into chunk of •  1Gb for memtable flush •  max_compaction_flush_memory_in_mb for compaction flush © DataStax, All Rights Reserved. 26

27 .Local write path data structures Index mode, data type Data structure Usage PREFIX, text Guava ConcurrentRadixTree name LIKE 'John%' name LIKE ’%John%' CONTAINS, text Guava ConcurrentSuffixTree name LIKE ’%ny’ age = 20 PREFIX, other JDK ConcurrentSkipListSet age >= 20 AND age <= 30 age = 20 SPARSE, other JDK ConcurrentSkipListSet age >= 20 AND age <= 30 suitable for 1-to-N index with N ≤ 5 © DataStax, All Rights Reserved. 27

28 .OnDiskIndex files SStable1 user_id4 FR user_id1 US user_id5 FR OnDiskIndex1 FR US SStable2 B+Tree-like user_id3 UK user_id2 DE data structures OnDiskIndex2 UK DE © DataStax, All Rights Reserved. 28

29 .Local read path •  first, optimize query using Query Planer (see later) •  then load chunks (4k) of index files from disk into memory •  perform binary search to find the indexed value(s) •  retrieve the corresponding partition keys and push them into the Partition Key Cache à Yes, currently SASI only keep partition key(s) so on wide partition it’s not very optimized ... © DataStax, All Rights Reserved. 29

3点赞

1收藏

0下载