In this talk we will highlight lessons that we learned as we migrated ~100 analysts and engineers with little to no experience in big data from our Hive ecosystem into the Databricks unified analytics platform. Oh and we did it all with a small support team of one.

Spark开源社区发布于2019/01/27

注脚

展开查看详情

1.99 problems but Databricks and Spark ain't one Wesley Kerr, Riot Games #DSSAIS20

2.● Principal Data Scientist on League of Legends ● Used to work at Google, Riot Games, and Adknowledge ● Leona support main

3.

4.

5.Choose Compete Win

6.Players & Data 100+ million 500+ billion 32 petabytes 6

7.Current Ingest Architecture #DSSAIS20 7

8.Lossy 8

9.Schemaless 9

10.Schemaless 10

11.11

12.Head-to-head tests between EMR and SparkSQL . 12

13.Personalized Offers 13

14.Year In Review 14

15.Future Ingest Architecture 15

16.16

17.Analysts’ Interactions #DSSAIS20 17

18.Migration

19.Standardize Access Player Stats Game Details Store Account Player Game Omnibus + game_id: long + gameId: long + acct_id: long + game_id: long + id: long + playerId: long + id: long + player_id: long + player_level: long + kills: int + store_id: long + player_mmr: int + deaths: int + assists: int Store Transactions + kills: int + deaths: int + acct_id: long + assists: int + amount: long + item: string + player_level: long + player_mmr: int 19

20.Be willing to experiment 20

21.Find a Champion 21

22.Find a Champion 22

23.Fix Root Causes 23

24.Be Unblockable 24

25.Foster a Community 25

26. Thank you! 1. Standardize access 2. Be willing to experiment 3. Find a champion 4. Fix root causes 5. Be unblockable 6. Foster a community #DSSAIS20 26

user picture
由Apache Spark PMC & Committers发起。致力于发布与传播Apache Spark + AI技术,生态,最佳实践,前沿信息。

相关文档