混合云中基于端到端的机器改进

我们已经部署了一个混合云存储解决方案,该解决方案利用了公共云中的计算和专门的硬件存储。我们将讨论混合云存储的折衷,哪些工作负载最适合这个模型,我们部署的流水线,以及我们所学到的挑战和最佳实践。Spark提供了一个灵活的计算环境,可以与当今的云计算提供商一起使用。
展开查看详情

1.An end-to-end Spark based data stack in the hybrid cloud Farhan Abrol fabrol92@gmail.com @F_Abrol Product Lead, Pure Storage www.linkedin.com/in/fabrol #HWCSAIS12

2.Outline • Environment overview & problems • Solutions - Hint : Spark • More Spark More Problems • Hybrid Cloud – Options & Performance comparison – Should you do it ? – Basics of datacenter #HWCSAIS12 2

3.Pure1 ● Fleet dashboard for IoT devices ○ Storage arrays ○ VM’s ● Real-time log/metric streaming ● 16 TB logs/metrics ingested daily ● Intelligence ○ Proactive scanning for issues ○ Predictive alerting ○ Machine learned forecasting #HWCSAIS12 3

4.Logs are king S3 S3 S3 Infrequent Access Historical Grep FUSE Filesystem Continuous or Machine Learning Ad-Hoc analysis by Engineering Daily ETL #HWCSAIS12 4

5.Problems - Speed of running historical greps - Bottlenecked on single machine throughput - Resource wastage for ETL machines - Code/maintenance for new ETL jobs - Becoming a monolith - ML training time - As data grows, taking 8-12 hours #HWCSAIS12 5

6. all the things ! - Faster* - Better resource utilization - Uniform language and tooling - Streaming / batch jobs - One infra to maintain #HWCSAIS12 6

7.#HWCSAIS12 7

8. Spark Executor 05/13/2018 - 5/14/2018 05/14/2018 - 5/15/2018 Spark Driver Spark Executor 05/15/2018 - 5/16/2018 Spark Executor rgrep “xyz” --obj-id 100 --start-date=5/13/18 --end-date=5/18/18 05/16/2018 - 5/17/2018 Spark Executor Grep -> Distributed grep on Spark #HWCSAIS12 8

9. Done ! #HWCSAIS12 9

10.Problem - AWS Cost trend #HWCSAIS12 10

11.#HWCSAIS12 11

12.Hybrid Cloud Data Center with HW Pure LUN EC2 VM Switch Switch Pure FS Direct-Connect EC2 VM 500 TB Dedicated 10G private fiber link #HWCSAIS12 12

13.Hybrid Cloud - Pricing Data in = $0/month Utility Price Usage Total per month 10G port $2.25/hr 720 hr $1620 Data transfer out of AWS $0.020/GB 500 TB $10000 AWS Cost $11620 #HWCSAIS12 13

14.Log analysis pipeline - Smoke test Phonehome DirectConnect 30 days logs servers 500 TB + EMR S3 Historical Grep + ML Infrequent Access #HWCSAIS12 14

15.Aside Storage system Storage Protocols Generic Optimized Flashblade #HWCSAIS12 15

16.AWS Only Hybrid with EC2 Hybrid with Local Compute EMR Switch EMR 5ms-20ms Switch Amazon S3 500 TB 500 TB #HWCSAIS12 16

17.144 node spark cluster ~3x-10x better throughput Workload - Distributed grep #HWCSAIS12 17

18.Hybrid with EC2 - Link latency Performance - Cloud networking stack EMR Costs 5ms-20ms Good for Switch - Read heavy workloads - Latency insensitive workloads 500 TB - Low Bandwidth workloads #HWCSAIS12 18

19.Hybrid with Local Compute Performance Switch Costs Good for - Read heavy workloads 500 TB - Latency sensitive workloads - High bandwidth workloads #HWCSAIS12 19

20.144 node spark cluster ~3x-10x better throughput Workload - Distributed grep #HWCSAIS12 20

21.Datacenter setup Software Compute servers 32 vCPUs ~$10-20k Networking switch ~$10k Storage Varies #HWCSAIS12 21

22.Conclusion ⎯ Best use cases: Workloads with higher read, lower write requirements ⎯ When write portion of read/write ratio increases, be cognizant of cumulative AWS transfer costs ⎯ High performance cloud services can be expensive, on-prem can alleviate this cost ⎯ Unique capabilities of on-prem storage & compute: ⎯ Instant snapshots ⎯ All kind of workloads on one platform ⎯ Resilience #HWCSAIS12 22

23.#HWCSAIS12 23