Apache Spark如何改变我们雇佣员工的方式

随着大数据技术的成熟,你会认为可以雇佣更多的人才。尽管有兴趣和参与大数据世界的人数急剧增加,但就业需求却遥遥领先。
展开查看详情

1.HOW APACHE SPARK CHANGED THE WAY WE HIRE PEOPLE Tomasz Magdanski, iPass #EntSAIS17

2.What if the war for talent ended and your company lost? • War for talent – Late ’90s warning from McKinsey about talent shortage – Urged companies to prioritize strategies around recruiting, retaining and developing key employees • One percent problem #EntSAIS17 2

3.Hiring is tough Source: edureka #EntSAIS17 3

4.And its going to get worse Source: Hour of Code #EntSAIS17 4

5.Since war for talent started we have made a full circle • Apart from hiring skilled engineers companies look inside to fill in the gap • Create path to grow within your organization • But wait a minute ? Didn't we just say there is a big skills gap ? #EntSAIS17 5

6.What are we building ? #EntSAIS17 6

7.Goals • Scalable platform • Cost effective • No data loss • Code portability • Easy R&D • Extendable • Support many languages • Support batch, stream • ML enabled • Collaborative #EntSAIS17 7

8.Who we were looking for ? • MapReduce • Hadoop / HDFS • Hive / Pig • Storm • Caching • Avro / Parquet • Distributed Computing • Manage Clusters and Infrastructure • Integrate tools • Data Warehousing and Modeling #EntSAIS17 8

9.Who we were looking for ? • CAP Theorem • Data Transformation • Data Collection • SQL • Cassandra / Hbase / MongoDB / mysql • Kafka • AWS • Scala / Java / Python • Understanding data structures and algorithms • Visualization and Data Analysis • Team player • SPECIFIC INDUSTRY KNOWLEDGE #EntSAIS17 9

10.Spark changed what we are looking for #EntSAIS17 10

11.Spark • Simple • Easy to learn • High abstraction API • Build in connectors to major data sources • Supports Batch, Stream • Highly optimized and extendible • ML library to run at scale • Spark provides transactional writes and exact once semantic #EntSAIS17 11

12.Spark and Databricks • A single platform that unifies data engineering and data science • Automated cluster management / zero-management infrastructure • Intuitive notebooks supporting multiple programming languages • Makes collaboration easy • Blends Data Engineering and Data Science workloads • APIs to integrate with other tools #EntSAIS17 12

13.Spark and Databricks • Integrated meta store • Integrated Managed and unmanaged tables • Workspace API • Engineering and Customer Support including Solution Architects • Easy dashboards • DbUtils • SBT tools for easy deployment #EntSAIS17 13

14.Who we are looking for now? • Experience in programming using APIs • Understanding data structures and algorithms • Scala / Java / Python / R • Visualization and Data Analysis • Team player • SPECIFIC INDUSTRY KNOWLEDGE #EntSAIS17 14

15.Business needs • Wi-Fi connectivity patterns • Our business and customers • Existing system architecture • Skip lengthy onboarding process • One stack to learn #EntSAIS17 15

16.How did that change the way we hire ? • We have internally hired: – QA engineer – App developers – Ex developer / product manager – Backend engineer • Externally hired: – One very experienced senior Data Engineer – Few collage grads – Junior data engineers #EntSAIS17 16

17.Summary • Thanks to Databricks we didn’t have to build a platform • Hired mixed of internal and external candidates • Focus on business needs • Created 6 data products • New seven digit revenue stream for our company • Continue to innovate #EntSAIS17 17