16/07 - INSTAGRAM 2016 by submmit

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/Cassandra/1607INSTAGRAM2016bysubmmit28079?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

中国Cassandra技术社区

发布于

6年前

6688

人观看

#信息技术 System Software Engineering

1 Overview 2 Improvements 3 Challenges

展开查看详情

1 .CASSANDRA @ INSTAGRAM 2016 Dikang Gu Software Engineer @ Facebook

2 .ABOUT ME • @dikanggu • Software Engineer • Instagram core infra, 2014 — present • Facebook data Infra, 2012 — 2014 2

3 .AGENDA 1 Overview 2 Improvements 3 Challenges 3

4 .OVERVIEW

5 .OVERVIEW Cluster Deployment • Cassandra Nodes: 1,000+ • Data Size: 100s of TeraBytes • Ops/sec: in the millions • Largest Cluster: 100+ • Regions: multiple 5

6 .OVERVIEW • Client: Python/C++/Java/PHP • Protocol: mostly thrift, some CQL • Versions: 2.0.x - 2.2.x • Use LCS for most tables. 6

7 .TEAM 7

8 .USE CASE 1 Feed PUSH When posting, we push the media information to the followers' feed store. When reading, we fetch the feed ids from the viewer's feed store. 8

9 .USE CASE 1 Feed • Write QPS: 1M+ • Avg/P99 Read Latency : 20ms/100ms • Data Model:  user_id —> List(media_id)

10 .USE CASE 2 Metadata store Applications use C* as a key value store, they store a list of blobs associated with a key, and do point query or range query during the read time. 10

11 .USE CASE 2 Metadata store • Read/Write QPS: 100K+ • Avg read size: 50KB • Avg/P99 Read Latency : 7ms/50ms • Data Model:  user_id —> List(Blob)

12 .USE CASE 3 Counter Applications issue bump/get counter operations for each user requests. 12

13 .USE CASE 3 Counter • Read/Write QPS: 50K+ • Avg/P99 Read Latency : 3ms/50ms • C* 2.2 • Data Model:  some_id —> Counter

14 .IMPROVEMENTS

15 .1. PROXY NODES 15

16 .PROXY NODE Problem • Thrift client, NOT token aware • Data node coordinates the requests • High latency and timeout when data node is hot. 16

17 .PROXY NODE Solution • join_ring: false • act as coordinator • do not store data locally • client only talks to proxy node • 2X latency drop 17

18 .2. PENDING RANGES (CASSANDRA-9258) 18

19 .PENDING RANGES Problem • CPU usage +30% when bootstrapping new nodes. • Client requests latency jumps and timeouts • Multimap<Range<Token>, InetAddress> PendingRange • In-efficient O(n) pendingRanges lookup for request 19

20 .PENDING RANGES Solution • Cassandra-9258 • Use two NavigableMaps to implement the pending ranges • We can expand or shrink the cluster without affecting requests • Thanks Branimir Lambov for patch review and feedbacks. 20

21 .3. DYNAMIC SNITCH (CASSANDRA-6908) 21

22 .DYNAMIC SNITCH • High read latency during peak time. • Unnecessary cross region requests. • dynamic_snitch_badness_threshold: 50 • 10X P99 latency drop 22

23 .4. COMPACTION 23

24 .COMPACTION IMPROVEMENTS • Track the write amplification. (CASSANDRA-11420) • Optimize the overlapping lookup. (CASSANDRA-11571) • Optimize the isEOF() checking. (CASSANDRA-12013) • Avoid searching for column index. (CASSANDRA-11450) • Persist last compacted key per level. (CASSANDRA-6216) • Compact tables before making available in L0. (CASSANDRA-10862) 24

25 .5. BIG HEAP SIZE 25

26 .BIG HEAP SIZE • 64G max heap size • 16G new gen size • -XX:MaxTenuringThreshold=6 • Young GC every 10 seconds • Avoid full GC • 2X P99 latency drop 26

27 .6. NODETOOL REBUILD RANGE (CASSANDRA-10406) 27

28 .NODETOOL REBUILD • rebuild may fail for nodes with TBs of data • Cassandra-10406 • support to rebuild the failed token ranges • Thanks Yuki Morishita for reviewing 28

29 .CHALLENGES

0点赞

0收藏

0下载