- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Milvus 实战系列 #2:元宇宙到家,那些“聪明”的设计工具
展开查看详情
1 .那些“聪明”的设计工具
2 .Contents 1 Business Background 2 Technology Selection 3 System Design 4 Search in Action 5 Summary
3 .Business Background
4 .Business Background Business unit Scenario Pain point Home furnishing enterprise System furnishing selling Difficult to do MCC pre-build 1
5 .Real World 2
6 .Real World 3
7 .Overview Design tool Business logic Data generation • Item attributes • Rule base • User journey • Prior knowledge • Extra operations • Item describe • …… • …… Data smart offering platform 4
8 .Technology Selection
9 .Data flow overview The data flow is similar with the traditional search engine / recommendation engine, which include offline data preparing, online serving and post processes Business logic Data fulfillment Extra operation Serving Recall Ranking Reorder BFF Data preparing Definition Generation Deployment Online Offline 5
10 .Requirements 01. Scenario 02. Serving 03. Storage 04.Support H5 Recall Stable Evolution App Ranking Threshold Self-repair Mini-program Response Compatibility Downgrade & fusing …… …… …… …… Multi-platform High-performance Block-free Robustness 6
11 .Online Components Ranking • Multiple roles Recall • Model base • Role base • Vector distance • Multiple dataset • Sample sorting VS ES BE Re-order • BU support • Business insight • Diversity 7
12 .Vector Search Descriptive Vector search Differ from the traditional search solutions, vector search engine afford the store & search ability on vector data. Evolution With the improvements on AI technology, especially the ML/DL ones, we use unstructured data (vector base) to describe the items, trying to finger out the attributes, behavers, interests for our machine understanding, and do our best to serving our customers Fusion 8
13 .Solutions Comparison 9
14 .System Design
15 .System Architecture 10
16 .Dataflow offline Definition • Items definition against business insight Data Generation • Data calculation • Scoring & quality assurance Feature Engineering • Label recognition Recall Data Preparing • Feature encoding • Data convert & searchable data generation • Dataset management Ranking Data Preparing • Build / upgrade model dataset • Dataset management 11
17 .Dataflow Online BE QU SP RA BE Receive Recall Post-process Receive & prepare query params Prepare query dsl Data convert & fulfillment + + + User profiles Multiple dataset Other operations for downstream Understand Rank Finger out the key params Do data scoring & ranking + + Prepare policy & sort params Other biz operations 12
18 .Search in Action
19 .Real Data TagsGroup1: 2360 * 1000 * 58 TagsGroup2: 居家型衣柜 TagsGroup3: 挂衣、叠衣、被褥放置、裤子收纳 TagsGroup4: 舒适型价格区间 TagsGroup5: T型分割件 TagsGroup6: …… 13
20 .Index Design (ES) 14
21 .Real Data Vector1: [1, 0, 0, 1, 0, 1, 1, 0 …] Vector2: [1.134246, -0.000498, 0.176506, 0.971405, -1.875313, …] Vector3: …… 15
22 .Index Design (Milvus) 16
23 .Resource Estimation ES PostgreSQL v7.13 v11 4C8G20G M *3 1C2G * 1 4C8G20G IC * 3 8C32G500G ID * 8 OSS Milvus 1.5 ~ 2 TB V1.0 8C16G500G * 1 17
24 .Recall Dsl (ES) 18
25 .Recall Dsl (Milvus) 19
26 .Summary
27 .Issues & Solutions Complex dataflow Dataset Separate Node optimization Too much calc nodes Separate into sub-sets Hyperparameter tuning Performance Parallel calculation Algorithm tuning Quality DAG optimization Expert system Issue Solution 1 Solution 2 20
28 .Issues & Solutions Performance More candidate Customized build Approximate calculation Multiple dataset recall More algorithms involve Vector representation Reflow data Manually blind test Comprehensibility MCC BU base Issue Solution 1 Solution 2 21
29 .Metric comparison 01 Speed up Average design time has been decreased from 180+ mins to around 30 mins 03 Serving number Global rollout which serving X BU, Y costumers in total 02 Recall rate Wardrobe proofing from 3.2 to 20 04 Sales raise Help to settle over X orders, ATV raises from Y to Z 22