- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
陈华曦--基于Apache Flink的搜索处理平台
展开查看详情
1 .ElasticFlow चԭApache Flinkጱᔱහഝ॒ቘଘݣ Search offline platform based on Apache Flink ݪلғᴨ᯾૬૬ ᘳ֖ғṛᕆದӫਹ ᄍᦖᘏғᴯเ
2 .Search&Rec in Alicloud Elastic Open AI Rec Search Search (beta)
3 .Search/Rec Offline Data Process RDS Offline System Open DTS Search Max Compute ODPS Elastic Search Stream Compute SLS AI Rec Full build Incremental build Rebuild all the documents from data sources with specific Synchronize the real time updates from data sources to search interval(daily). engine. Ex: RDS binlog parsed by DTS
4 .Pain Points Full Inc Monitor Different Complex business Perf. and Ops datasources
5 .Elastic Flow რᛔᴨ᯾ᔱᐶᕚଘ҅ݣ׀Ӟᒊୗහഝ॒ቘӱ̵ݎۓ᮱ᗟ̵ᬩᖌᚆ҅ێჿ᪃አ ಁਖ਼ग़۸හഝ॒ቘଚفᔱവគකጱᵱ̶ • ᶎݻአಁጱݎኴᶎ҅Ӟེ҅ݎᛔۖኞ౮قᰁीᰁձۓၞᑕ • ၹᰁහഝṛᚆ҅ჿ᪃قᰁṛ҅ݺރीᰁṛਫጱᥝ • හഝრඪ೮ғ5'62'36'766/6.DINDᒵᒵ • ࢵᴬ۸ग़UHJLRQᛔۖ۸᮱ᗟ҅ग़ᖌଶፊഴಸᦄ
6 .Elastic Flow Arch Elastic Flow SAROS Catalog Swift Apache Flink Maat HBase Queue Yarn Hippo HDFS Compute Storage
7 .Business Graph Business Table Abstract table, was composed of full table and inc table. The two tables are sharing the same schema and business Full:RDS Inc:DTS Schema
8 .RDS DTS FlinkJ FlinkJ ODPS obs HBASE obs ES SLS Sync Layer Process Layer
9 .Job Generate Flink SQL Full Inc Business Join Join Graph Compute Flink SQL Job Graph Translator Full Inc Sync Sync Optimized Graph Maat DAG Publis Schedule Maat Build h Job Graph Translator Start
10 .Incremental sync SQL Full sync SQL CREATE TABLE DTSSource_test_table ( CREATE TABLE MysqlSource_test_table ( `path` VARCHAR, `path` VARCHAR, `status` VARCHAR, `status` VARCHAR, `receive_id` VARCHAR `receive_id` VARCHAR ) with ( ) with ( tableFactoryClass tableFactoryClass =‘DTSExTableFactory’ =‘MySQLScanTableFactory’ ) ) INSERT INTO HbaseSink_main_table INSERT INTO HfileSink_main_table SELECT SELECT rowKeyGen(`receive_id`, 2000, 4) AS rowkey, rowKeyGen(`receive_id`, 2000, 4) AS rowkey, `path`, `path`, `status`, `status`, `receive_id` `receive_id`, FROM DTSSource_test_table FROM MysqlSource_test_table where receive_id <> ''; where receive_id <> '';
11 .Optimization Ø Async dim join increase throughput while reading Hbase Ø Flink state on Niagara reduces Hbase IO Ø Blink 2.2.x new feature: Credit Based flow control, network buffer zero copy Ø Object reuse leads to less GC
12 .Batch Ø Batch/Stream unified SQL boost dev efficiency Ø Bulkload mass data to Hbase though Flink batch and hfilesink Ø External Shuffle Service Ø Per unit schedule uses less resource for the same job
13 .Hbase table design
14 .Inside Alibaba Ø Tens of billions of docs per day, nearly 1 PB. Lazada Youku Aliexpress Ø Almost 1 million tps in incremental build. Ding Hema Flypig Talk Ø Double 11 for three years ᘸښᓒ Xianyu Ⴃਮ
15 .ᥢښ᪠ᕚࢶ ള(6҅ඪ೮5'6 ള2SHQ6HDUFK҅ඪ ᘏ2'36ܔᤒݶ ೮ṛᕆᇇᇿՁࣳᵞᗭ ྍ̶૪ᕪلၥ ጱग़۸ᵱ̶ࢵᴬ۸ ग़܄ऒ᮱ᗟ̶ ଙ์ ଙ์ ଙ์ ଙ์ ള$,5HF҅࣋׀ว۸ ᭐)ڊOLQN )XFWLRQ҅ඪ ጱᓒဩૡᑕᚆ҅ێ۱ೡᇙ ೮ग़හഝᤒ॒॔ቘ҅ ҅ྍݶཛྷࣳᦒᕞᒵ ۱ೡ-RLQ̵)LOWHUᒵ
16 .