申请试用
HOT
登录
注册
 
Deep dive#4 Milvus 数据写入与持久化
0 点赞
0 收藏
0下载
Milvus.io
/
发布于
/
15
人观看

Deep dive是由Milvus社区发起的代码解析系列直播,针对开源数据库 Milvus 整体架构开放式解读,与社区交流与分享 Milvus 最核心的设计理念。

对本期内容感兴趣的小伙伴,想要和讲师实时QA,欢迎大家添加小助手微信:Zilliz-tech 备注”直播“加入讨论群与大家共同交流!

本期分享大纲:

  1. Milvus 2.0整体写入流程介绍
  2. 数据分配流程
  3. 文件结构和数据持久化
  4. QA
展开查看详情

1.2021.09 Milvus Deep Dive #4 Data Insertion and Persistency

2.Speaker bio Bingyi Sun Software Engineer • Milvus2.0 Development • KV Database/Distributed System • Reading/Bask etball/Computer Gaming

3.Agenda • Milvus 2.0 Data Insertion Process Overview • Data Allocation • File Structure and Data Persistency • Q&A

4.01 Data Insertion Process Overview

5.Milvus Architecture Overview

6.Data Insertion Related

7.Prox y

8.Data Flow Details

9.DataCoord & DataNode

10.Data Flow Details Collection Channels Assigned By DataCoord V1 DataNode V2 V3 DataNode V4

11.RootCoord & Time Tick First read: 1, 2 Second read: 6, 7, 8

12.02 Data Allocation

13.Data organization

14.Segment

15.Channel Channels are assigned to DataNodes according to different strategies (eg. Consistent Hash ->)

16.When

17.How InsertRequest (CollectionID, PartitionID, Channel, NumOfRows) Check if there’s enough space to save that much rows. If so, we return a list of segments as response. Or we will open new segments. 1 request ßà 1 or n segments A segment’s max size is determined by ” segment.maxSize” in data_coord.yaml

18.Data Ex piration Again, time tick plays a crucial role in inserting. A time tick in a channel means that all data before this time tick are sent to this channel We return a segment allocation with an expiration time T. Proxy can not use this allocation to insert data with time tick bigger T. By default, the expiration time of a single allocation is 2000ms which is defined by “segment.assignmentExpiration” in data_coord.yaml.

19.When to Seal 1. 2. Receive a Flush Collection Request 3. A segment‘s lifetime is too long 4. Too much growing segments in a channel

20.When to Flush DataNode will report the time tick of a channel to DataCoord. If the time tick received by DataCoord is larger than the time tick of a segment’s last allocation, the segment’s allocated space is released.

21.Some Details 1. How to ensure that a segment will be flushed after all data is consumed? 2. How to ensure that no data will be written to a segment after the segment is flushed? 3. Is segment limited to a max size strictly? 4. How to estimate a segment’s max rows num? 5. What happens when users call “Flush” frequently? 6. How to ensure that no data will be consumed multi times after DataNodes restart? 7. When to create index?

22.03 File Structure and Data Persistency

23.DataNode Flush

24.File Structure Binlog: 1. Restore Data; 2. Create Index

25.Persistency 单击此处添加文本

26.TODO 1. Delete By ID 2. Compaction to merge small segments and release space 3. Bulk load

27.Thanks & QA 扫码加入直播交流群 关注 Milvus 视频号 与讲师实时QA 直播视频早知道

0 点赞
0 收藏
0下载