- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Zeppelin 机器学习最新特性和规划
展开查看详情
1 .ZEPPELIN 机器学习最新特性和规划 刘勋 Apache Zeppelin Committer
2 .自我介绍 刘勋 Apache Zeppelin Committer Apache Hadoop Submarine Project Team Member Staff Engineer @NetEase
3 .目录 What Is Apache Zeppelin? Zeppelin Machine Learnine Zeppelin New Feauter
4 .WHAT IS APACHE ZEPPELIN ? Data Ingestion Data Discovery Data Analytics Data Visualization Data Collaboration
5 .Multiple Language Backend • Concept allows any language/data- processing-backend to be plugged into Zeppelin. • Currently Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown , Shell and … • Adding new language-backend is really simple.
6 . Data visualization Some basic charts are already included in Apache Zeppelin. Visualizations are not limited to SparkSQL query, any output from any language backend can be recognized and visualized.
7 .Pivot chart • Apache Zeppelin aggregates values and displays them in pivot chart with simple drag and drop. • You can easily create chart with multiple aggregated values including sum, count, average, min, max.
8 .Dynamic forms Apache Zeppelin can dynamically create some input forms in your notebook.
9 .Collaborate by sharing your Notebook & Paragraph Your notebook URL can be shared among collaborators. Then Apache Zeppelin will broadcast any changes in realtime, just like the collaboration in Google docs.
10 .目录 What Is Apache Zeppelin? Zeppelin Machine Learnine Zeppelin New Feauter
11 . Zeppelin Architecture Interactive Zeppelin Development Computing Tensorflow PyTorch Python R / Scala Hive Spark Flink Engine Resource Kubernetes YARN Zeppelin Cluster Manager Infrastructure HDFS AWS S3 Docker CPU GPU
12 .Machine Learning in a Unified Platform
13 . Machine learning workflow Feature Model Exper- Experiment Selection Training iment Feature Model Model as Transform Evaluation Service Model Data Feature Feature Model Real-time Database Encoding Validation Feature Feature Model Online Calibration Evaluation Staging Feature Data Preprocessing Model Training Online Service Feature Engineering
14 . Data Preprocessing & Feature Engineering Import data - HDFS Feature - AWS S3 Selection - RDBMS Feature Transform Join Data Data Feature Encoding Data exploration Feature Evaluation Data sample Data Preprocessing Training / Test Feature Engineering
15 . Model Training Traditional machine Deep learning models learning models Model - DNN - Logistic Regression Training - CNN - Gradient boosting tree Model - RNN - Recommendation/ALS Evaluation - LSTM - LDA Feature Model Validation Libraries Model Libraries - Python Lib Staging - TensorFlow - Apache Spark MLlib - PyTroch - XGBoost - MXNet Model Training
16 .Top Python Scala R Librarines in Data Science
17 .Model Serving Model Manager Model depoly Exper- Experiment iment Model serving Model as Service - Batch Model - Streaming Database Real-time Feature Exploration Online Calibration - offline Feature - online (A / B test) Online Service
18 . Zeppelin Integration Hadoop Submarine Algorithm develop Job scheduling Tensorboard Monitor {Submarine} CLI / REST User
19 .
20 .Submarine Integration Zeppelin
21 .
22 .Model Serving (ZEPPELIN-3994)
23 .目录 What Is Apache Zeppelin? Zeppelin Machine Learnine Zeppelin New Feauter
24 .Zeppelin Cluster Mode (ZEPPELIN-3471) 1RWHERRN5HSR 1RWHERRN5HSR 6KGIVŏ 6KGIVŏ =HSSHOLQ&OXVWHU =HSSHOLQ&OXVWHU interpreter- interpreter- interpreter- interpreter- GHOHWHLQWSPHWD UHFRQQHFW,QWSWKULIW interpreter- process1 process1; process1 process1 processM QHZLQWSPHWD Cluster MetaData Cluster MetaData WKULIW WKULIWLS SRUW WKULIW WKULIW0LS SRUW interpreter-process1 UHQHZLQWS interpreter-process1 thrift1(ip&port) JHW,QWS0HWD thrift1(ip&port) 5DIW 5DIW 5DIW 5DIW interpreter-processM interpreter-process1 ; ; interpreter-processM thriftM(ip&port) thrift1(ip&port) thriftM(ip&port) ]HSSHOLQ6HUYHU zepl-Server1 exception ; ]HSSHOLQ6HUYHU ]HSSHOLQ6HUYHU ]HSSHOLQ6HUYHU1 ]HSSHOLQ6HUYHU1 zepl-Server1 exception ; ړୗ Distributed zeppelin Zeppelin Cluster6HUYHU ,QWHUSUHWHU3URFHVVᲙᐏࢶ zeppelin Server & Interpreter Process fault architecture diagram Distributed tolerance zeppelin Server fault diagram tolerance diagram ړୗ zeppelin ᔮᕹຝࢶ ᧔กғԅԧๅႴศᅩ᧔ก॒ቘၞᑕ҅ڢᴻԧ๐ۓᲙ෫ىጱ̶ٖ ړୗ 1. zeppelin 6HUYHUᲙᐏࢶ 0XOWLSOH]SSHOLQ6HUYHUV=HSO6HUYHU=HSO6HUYHU=HSO Description: In order to explain the process more clearly, the content Description: In order to explain the process more clearly, the content that is 6HUYHU1DUHEXLOWLQWRWKH=HSSHOLQ&OXVWHUE\5DIWDOJRULWKP7KH 1. ਖ਼ग़ݣ that related tozppelin is not relatedServerҁࢶӾғ=HSO6HUYHU̵=HSO6HUYHU̵ service to service fault fault is tolerance tolerance deleted.is deleted. 1. ୮]HSO6HUYHU,QWHUSUHWHU3URFHVV᮷ݎኞଉ҅1J[LQտ༄ၥک not ᧔กғԅԧๅႴศᅩ᧔ก॒ቘၞᑕ҅ڢᴻԧ๐ۓᲙ෫ىጱ̶ٖ 5DIWDOJRULWKPHQVXUHVWKDWDOO=HSSHOLQ6HUYHUVFDQDFFHVVFOXVWHU =HSO6HUYHU1҂᭗ᬦ Raft ᓒဩᕟୌ౮ Zeppelin Cluster҅5DIWᓒဩᏟ ]HSO6HUYHUݎኞଉਖ਼ਙӥᕚҔ,QWHUSUHWHU3URFHVVᑕଧᭅڊտࣁ PHWDGDWD&OXVWHU0HWD כಅํ=HSSHOLQ6HUYHU᮷ݢզӞᛘጱᦢᳯᵞᗭزහഝ&OXVWHU 1. When zepl-Server1 has in anboth exception 1.&OXVWHU0HWDӾڢᴻᛔ૩ጱزහഝ҅ইຎ,QWHUSUHWHU3URFHVVᑕଧᭅڊဌ When an exception occurs and Interpreter zepl-Server1 Process and Interpreter is Process, QJLQ[PDVWHU 1. ୮]HSO6HUYHUݎኞଉ҅ᘒ,QWHUSUHWHU3URFHVVݢአ҅1J[LQ available, Ngxin QJLQ[PDVWHU 0HWD̶ Ngxin will detect thatwill detect that has zepl-Server1 zepl-Server1 an exception has and an exception and bring it offline; ํ౮ڢۑᴻزහഝٌ҅՜ጱ]HSSHOLQ6HUYHUԞտ᭗ᬦ؋଼༄ັ҅ਖ਼ӧݢአ տ༄ၥ]کHSO6HUYHUݎኞଉਖ਼ਙӥᕚҔ bring QJLQ[EDFNXS QJLQ[EDFNXS QJLQ[EDFNXS when theitInterpreter offline. Process program exits, it will delete its own metadata %\SUR[\LQJPXOWLSOH]SSHOLQVWKURXJK1JLQ[DQGPDSSLQJWRWKH ጱزහഝᬰᤈڢᴻҔ QJLQ[VHUYHUYLDGRPDLQQDPH\RXFDQDFFHVVRQHRIWKHPXOWLSOH in Cluster ਖ਼ग़ݣ Meta, if the᭗ᬦ zppelin Interpreter Process program exits without Metadata is NJLQ[؉դቘ҅᭗ᬦऒݷฉکQJLQ[๐҅ۓ 2.୮ܻӞፗᦢᳯ]HSO6HUYHU୮አಁҁইғ8VHU҂ེٚಗᤈQRWH When you have been accessing zepl-Server1 =HSSHOLQ6HUYHUVEHKLQG1J[LQWKURXJKWKHGRPDLQQDPH successfully deleted, and other zeppelin Servers also allpass the time, when check the health the ੪ݢզ᭗ᬦऒᦢݷᳯ1J[LQݸᶎጱग़=ݣHSSHOLQ6HUYHUӾጱٌӾӞ to୮ܻӞፗᦢᳯ]HSO6HUYHU୮አಁҁইғ8VHU҂ེٚಗᤈQRWHጱ user delete (such as User1) executes the unavailable metadata.note again, Nginx will redirect the user's ጱ҅ײ1JLQ[տਖ਼አಁጱKWWS᧗ٌݻ՜ྋଉጱ]HSSHOLQ6HUYHU VZLWFK VZLWFK ̶ݣ ҅ײ1JLQ[տਖ਼አಁጱKWWS᧗ٌݻ՜ྋଉጱ]HSSHOLQ6HUYHUӾ݄҅ http request to other normal zeppelin Server, as shown in the figure Ӿ݄҅ইࢶӾಅᐏጱ=HSO6HUYHUҔ :KHQXVHUVVXFKDV8VHUDQG8VHUDFFHVV=HSSHOLQ6HUYHU 2.ইࢶӾಅᐏጱ=HSO6HUYHUҔ Zepl-Server2; When you have been accessing zepl-Server1 all the time, when the user WKURXJKWKH,QWHUQHW1J[LQORJVWKHXVHUWRDGLƈHUHQW=HSSHOLQ ୮አಁ8VHU̵8VHUᒵአಁ᭗ᬦ,QWHUQHWᦢᳯ=HSSHOLQ6HUYHU (such as User1) executes note again, Nginx will redirect the user's http =HSO6HUYHUӾဌํአಁ8VHUጱ,QWHUSUHWHU3URFHVVզ݊ 6HUYHUDFFRUGLQJWRWKHGLVWULEXWLRQSROLF\$VVKRZQLQWKHƉJXUH ҅1J[LQ໑ഝݎړᒽኼ҅ਖ਼አಁጭ୯کӧݶጱ=HSSHOLQ6HUYHUӾ҅ 3. There is nonormal Interpreter Process andasSession to other zeppelin Server, 6HVVLRQ=҅௳מHSO6HUYHUḒض᭗ᬦࣁᵞᗭزහഝ&OXVWHU0HWDӾ shown information =HSO6HUYHUӾဌํአಁ8VHUጱ,QWHUSUHWHU3URFHVVզ݊6HVVLRQ request in the figurefor user Zepl- 8VHUXVHV=HSO6HUYHUDQG8VHUXVHV=HSO6HUYHU1 ইࢶӾಅᐏ҅8VHUֵአ=HSO6HUYHU҅8VHUֵአ=HSO6HUYHU1̶ User1 in Zepl-Server2. Zepl-Server2 first looks up the Interpreter Server2; =҅௳מHSO6HUYHUḒض᭗ᬦࣁᵞᗭزහഝ&OXVWHU0HWDӾັತ8VHU ັತ8VHUጱ,QWHUSUHWHU3URFHVVزහഝ҅ইຎತکԧ҅᧔กಅᵱᥝ Process metadata of User1 in the cluster metadata Cluster Meta. If ጱ,QWHUSUHWHU3URFHVVزහഝ҅ဌํತ҅کᮎԍୌࣁ=HSO6HUYHU᯿ෛ ጱ,QWHUSUHWHU3URFHVVՖᆐࣁ=HSSHOLQ&OXVWHUӾਂࣁҔ found,istheno required Interpreter Process is still andinSession Zeppelininformation 1J[LQGHWHUPLQHVWKDWWKHXVHULVFRQQHFWHGWRDYDOLG=HSSHOLQ 1J[LQ᭗ᬦ༄ັ=HSSHOLQ&OXVWHUӾጱ=HSSHOLQ6HUYHUฎݢވզᦢ 3. There user User1 Interpreter Process Cluster. Exist in ]HSOQHWHDVHFRP ڠୌ5HPRWH,QWHUSUHWHU5XQQLQJ3URFHVVҔ ]HSOQHWHDVHFRP in 6HUYHUE\FKHFNLQJLIWKH=HSSHOLQ6HUYHULQWKH=HSSHOLQ&OXVWHULV Zepl-Server2. Zepl-Server2 first finds User1's Interpreter Process metadata ᳯ҅٬ਧአಁᬳളํکපጱ=HSSHOLQ6HUYHUӾ̶ =HSO6HUYHU᭗ᬦ឴ݐጱ8VHUጱ,QWHUSUHWHU3URFHVVزහഝמ DFFHVVLEOH in the cluster metadata Cluster Meta. If it is not found, it is built in Zepl- Server2௳Ӿጱ7KULIW,3 3RUW҅᯿ෛڠୌ =HSO6HUYHUUHFUHDWHV5HPRWH,QWHUSUHWHU5XQQLQJ3URFHVVE\ to re-create RemoteInterpreterRunningProcess. አಁ᭗ᬦ=HSSHOLQڠୌᛔ૩ᵱᥝጱᥴ᯽ᬰᑕ҅,QWHUSUHWHU 5HPRWH,QWHUSUHWHU5XQQLQJ3URFHVV҅=HSO6HUYHUಅࣁጱ๐ۓ REWDLQLQJ7KULIW,3 3RUWLQ8VHU V,QWHUSUHWHU3URFHVVPHWDGDWD 7KHXVHUFUHDWHVWKHLQWHUSUHWHUSURFHVVKHQHHGVWKURXJK 3URFHVVտਖ਼ᛔ૩=׀HSSHOLQ6HUYHUᬳളጱ7KULIW,3 3RUWᬯԶ Ӿᬳളᬦ݄҅੪ݢզ᯿ෛֵአզڠڹୌᬦጱ,QWHUSUHWHU3URFHVV̶ LQIRUPDWLRQDQGFRQQHFWVWRWKHVHUYHUZKHUH=HSO6HUYHULV =HSSHOLQDQGWKH,QWHUSUHWHU3URFHVVVDYHVWKHPHWDGDWD زහഝ&کਂכ௳מOXVWHU0HWDӾ̶ ORFDWHGWRUHXVHWKHSUHYLRXVO\FUHDWHG,QWHUSUHWHU3URFHVV LQIRUPDWLRQRIWKH7KULIW,3 3RUWWKDWSURYLGHVWKH=HSSHOLQ6HUYHU 8VHU 8VHU 8VHU FRQQHFWLRQWR&OXVWHU0HWD 8VHU
25 .Zeppelin Cluster Mode (ZEPPELIN-3471)
26 .Zeppelin Cluster Mode (ZEPPELIN-3471)
27 .Zeppelin Cluster Mode (ZEPPELIN-3471)
28 .Zeppelin Cluster Mode (ZEPPELIN-3471)
29 . Zeppelin Cluster + Docker (ZEPPELIN-4104) 1RWHERRN5HSR 6KGIVŏ =HSSHOLQ&OXVWHU 'RFNHU&RQWDLQHU 'RFNHU&RQWDLQHU interpreter- interpreter- processM process1 Cluster MetaData interpreter-process1 5DIW 5DIW thrift1(ip&port) interpreter-processM ]HSSHOLQ6HUYHU ]HSSHOLQ6HUYHU ]HSSHOLQ6HUYHU1 thriftM(ip&port) QJLQ[PDVWHU QJLQ[EDFNXS VZLWFK ]HSOQHWHDVHFRP 8VHU 8VHU