18/12 - SASI, Cassandra on the full text search ride - Apache Bi

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/Cassandra/18_12SASICassandraonthefulltextsearchrideApacheBigData201678394?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

中国Cassandra技术社区

发布于

5年前

4400

人观看

#信息技术

主要介绍 SASI的全文索引

展开查看详情

1 .SASI, Cassandra on the full text search ride DuyHai DOAN – Apache Cassandra™ Evangelist

2 .1 5 minutes introduction to Apache Cassandra™ 2 SASI introduction 3 SASI cluster-wide 4 SASI local read/write path 5 Query planner 6 Some benchmarks 7 Take away 2 @doanduyhai

3 .Trademark Policy From now on … Cassandra ⩵ Apache Cassandra™ 3 @doanduyhai

4 .5 minutes introduction to Apache Cassandra™ @doanduyhai

5 .The tokens Random hash of #partition à token = hash(#p) C* C* Hash: ] –x, x ] C* C* hash range: 264 values x = 264/2 C* C* C* C* 5 @doanduyhai

6 . Token ranges ⎤ 3x ⎤⎥ ⎤ x ⎤⎥ B C A: ⎥−x,− ⎥ ⎥ E : ⎥0, ⎥ ⎥ ⎦ 4⎦ ⎦ 4⎦ ⎤ 3x 2x ⎤⎥ ⎤ x 2x ⎤⎥ B: ⎥⎥− ,− F :⎥ , ⎥ ⎥ A D ⎦ 4 4 ⎥⎦ ⎦4 4 ⎦ ⎤ 2x x ⎤ ⎤ 2x 3x ⎤ C : ⎥⎥− ,− ⎥⎥ G : ⎥⎥ , ⎥⎥ ⎦ 4 4⎦ ⎦ 4 4 ⎦ H E ⎤ x ⎤⎥ ⎤ 3x ⎤⎥ D: ⎥− ,0 ⎥ ⎥ H : ⎥ ,x ⎥ ⎥ ⎦ 4 ⎦ ⎦ 4 ⎦ G F 6 @doanduyhai

7 .Distributed tables CREATE TABLE users( B C user_id int, …, PRIMARY KEY(user_id) A D ), user_id1 user_id2 H E user_id3 user_id4 user_id5 G F 7 @doanduyhai

8 .Distributed tables B C user_id3 user_id1 A D H user_id4 E user_id2 G F user_id5 8 @doanduyhai

9 .Coordinator node 2 3 Responsible for handling requests (read/write) B C Every node can be coordinator 1 A D •masterless •no SPOF •proxy role H E request coordinator G F 9 @doanduyhai

10 .!" Q&A 10 @doanduyhai

11 .SASI introduction @doanduyhai

12 .What is SASI ? • SSTable-Attached Secondary Index à new 2nd index impl that follows SSTable life-cycle • Objective: provide more performant & capable 2nd index 12 @doanduyhai

13 .Who created it ? Open-source contribution by an engineers team 13 @doanduyhai

14 .Why is it better than native 2nd index ? • follow SSTable life-cycle (flush, compaction, rebuild …) à more optimized • new data-strutures • range query (<, ≤, >, ≥) possible • full text search options 14 @doanduyhai

15 .Demo 15 @doanduyhai

16 .SASI cluster-wide @doanduyhai

17 . Distributed index On cluster level, SASI works exactly like native 2nd index B C UK user87 user176 … user987 A D UK user1 user102 … user493 US user54 user483 … user938 UK user17 user409 … user787 H E G F 17 @doanduyhai

18 .Distributed search algorithm B C A D 1st round Concurrency factor = 1 H E coordinator G F 18 @doanduyhai

19 .Distributed search algorithm B C A D Not enough results ? H E coordinator G F 19 @doanduyhai

20 .Distributed search algorithm B C 2nd round Concurrency factor = 2 A D H E coordinator G F 20 @doanduyhai

21 .Distributed search algorithm B C A D Still not enough results ? H E coordinator G F 21 @doanduyhai

22 .Distributed search algorithm B C A D 3rd round Concurrency factor = 4 H E coordinator G F 22 @doanduyhai

23 .Concurrency factor formula • more details at: http://www.doanduyhai.com/blog/?p=13191 23 @doanduyhai

24 .Caveat 1: non restrictive filters B C Hit all nodes A D eventually L H E coordinator G F 24 @doanduyhai

25 .Caveat 1 solution : always use LIMIT B C SELECT * FROM … A D WHERE ... LIMIT 1000 H E coordinator G F 25 @doanduyhai

26 .Caveat 2: 1-to-1 index (user_email) B C A D Not found WHERE user_email = ‘xxx' H E coordinator G F 26 @doanduyhai

27 .Caveat 2: 1-to-1 index (user_email) B C A D WHERE user_email = ‘xxx' H E Still no result coordinator G F 27 @doanduyhai

28 .Caveat 2: 1-to-1 index (user_email) B C A D WHERE user_email = ‘xxx' At best 1 user found At worst 0 user found H E coordinator G F 28 @doanduyhai

29 .Caveat 2 solution: materialized views For 1-to-1 index/relationship, use materialized views instead CREATE MATERIALIZED VIEW user_by_email AS SELECT * FROM users WHERE user_id IS NOT NULL and user_email IS NOT NULL PRIMARY KEY (user_email, user_id) But range queries ( <, >, ≤, ≥) not possible … 29 @doanduyhai

6点赞

2收藏

0下载