Apache Spark开发介绍

Databricks的工程师,Apache Spark Committer介绍了Databricks和Spark的历史,包括了Spark 1.4中的重要特性和进展,涵盖了Spark早期版本的主要功能和使用方法,讲了大数据领域近些年的发展,也介绍了Spark从这些年其它理论或者技术中吸取的灵感,当然,更多介绍了Spark的基本组件的使用方法,可以看作非常好的Spark教学教程。
展开查看详情

1.Intro to Spark Development June 2015: Spark Summit West / San Francisco http ://training.databricks.com/intro.pdf https :// www.linkedin.com /in/ bclapper

2.m aking big data simple Databricks Cloud: “A unified platform for building Big Data pipelines – from ETL to Exploration and Dashboards, to Advanced Analytics and Data Products .” Founded in late 2013 by the creators of Apache Spark Original team from UC Berkeley AMPLab Raised $47 Million in 2 rounds ~55 employees We’re hiring! Level 2/3 support partnerships with Hortonworks MapR DataStax ( http://databricks.workable.com )

3.The Databricks team contributed more than 75% of the code added to Spark in the past year

4.Agenda History of Big Data & Spark RDD fundamentals Databricks UI demo Lab: DevOps 101 Transformations & Actions Before Lunch Transformations & Actions (continued) Lab: Transformations & Actions Dataframes Lab: Dataframes Spark UIs Resource Managers: Local & Stanalone Memory and Persistence Spark Streaming Lab: MISC labs After Lunch

5.Some slides will be skipped Please keep Q&A low during class (5pm – 5:30pm for Q&A with instructor) 2 anonymous surveys: Pre and Post class Lunch: noon – 1pm 2 breaks (sometime before lunch and after lunch)

6.Homepage: http :/ / www.ardentex.com / LinkedIn: https :// www.linkedin.com/in/bclapper @brianclapper 30 years experience building & maintaining software systems Scala, Python, Ruby, Java, C, C# Founder of Philadelphia area Scala user group (PHASE) Spark instructor for Databricks Instructor: Brian Clapper

7.Survey completed by 58 out of 115 students Your job?

8.Survey completed by 58 out of 115 students Traveled from?

9.Survey completed by 58 out of 115 students Which Industry?

10.Survey completed by 58 out of 115 students Prior Spark training?

11.Survey completed by 58 out of 115 students Hands on Experience with Spark?

12.Survey completed by 58 out of 115 students Spark usage lifecycle?

13.Survey completed by 58 out of 115 students Programming Experience

14.Survey completed by 58 out of 115 students Programming Experience

15.Survey completed by 58 out of 115 students Programming Experience

16.Survey completed by 58 out of 115 students Big Data Experience

17.Survey completed by 58 out of 115 students Focus of class?

18.NoSQL battles Storage vs Processing wars Compute battles HBase vs Cassanrdra Relational vs NoSQL Redis vs Memcached vs Riak MongoDB vs CouchDB vs Couchbase MapReduce vs Spark Spark Streaming vs Storm Hive vs Spark SQL vs Impala Mahout vs MLlib vs H20 (then) (now) Solr vs Elasticsearch Neo4j vs Titan vs Giraph vs OrientDB

19.NoSQL battles Storage vs Processing wars Compute battles HBase vs Cassanrdra Relational vs NoSQL Redis vs Memcached vs Riak MongoDB vs CouchDB vs Couchbase Neo4j vs Titan vs Giraph vs OrientDB MapReduce vs Spark Spark Streaming vs Storm Hive vs Spark SQL vs Impala Mahout vs MLlib vs H20 (then) (now) Solr vs Elasticsearch

20.NOSQL Popularity WInners Key -> Value Key -> Doc Column Family Graph Search Redis - 95 Memcached - 33 DynamoDB - 16 Riak - 13 MongoDB - 279 CouchDB - 28 Couchbase - 24 DynamoDB – 15 MarkLogic - 11 Cassandra - 109 HBase - 62 Neo4j - 30 OrientDB - 4 Titan – 3 Giraph - 1 Solr - 81 Elasticsearch - 70 Splunk – 41

21.General Batch Processing Pregel Dremel Impala GraphLab Giraph Drill Tez S4 Storm Specialized Systems (iterative, interactive, ML, streaming, graph, SQL, etc ) General Unified Engine (2004 – 2013) (2007 – 2015?) (2014 – ?) Mahout

22.Scheduling Monitoring Distributing

23.RDBMS Streaming SQL GraphX Hadoop Input Format Apps Distributions: CDH HDP MapR DSE Tachyon MLlib DataFrames API

24.

25.Developers from 50+ companies 400+ developers Apache Committers from 16+ organizations

26.vs YARN SQL MLlib Streaming Mesos Tachyon

27.10x – 100x

28.Aug 2009 Source: openhub.net ...in June 2013

29.Distributors Applications