- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Hybrid collaborative tiered storage with Alluxio
展开查看详情
1 .Hybrid collaborative tiered storage with Alluxio Thai Bui Data Engineer @ Bazaarvoice
2 .Bazaarvoice ● Founded in 2005 in Austin, TX ● Digital marketing SaaS platforms for ratings and reviews ○ Display & syndicate reviews from brands to retailer websites ○ Reporting & analytics on consumers, reviews, products, etc. ● 2,600 client websites ● 5.4 billion product page views each month ● 900 million unique shoppers each month
3 .Reporting & analytics on S3 When you have 100s of TB of data on S3 ● Just listing the files is slow ● Download speed in EC2 is limited (50-150Mb/s per node) ● No concept of cache ● No concept of data locality
4 .AWS S3 : The Need For Speed ● Add tiered storage to S3 ○ Hot, warm, cold storage (fastest, fast, and not so fast) ○ Metadata cache ○ Data cache ● Keep data local ○ In the same machine, not via the Ethernet cable ● Compatible with existing services ○ Hadoop, Spark, Hive, Presto, etc. ● Adaptive & highly configurable ○ Symlink for S3
5 .Overview ● Alluxio App2 App1 Spark ○ Distributed data storage ○ Hadoop compatible Metastore Cold ○ By AMPLab S3 ● ZFS Alluxio ○ OS-level file system Hot & Warm ○ Volume manager ○ By Sun Microsystems ZFS ● Both are open-source
6 .Alluxio : The tiered-storage layer ● Support for native filesystem and Hadoop filesystem ● Distributed and can be installed on every node ○ Provides data locality ● Mount S3, HDFS, etc. to Alluxio ○ Think symlink. No data movement. ● Use Hive metastore to partition data into hot/warm and cold region ○ Acts as a remote tiered-storage layer
7 .ZFS : The acceleration layer ● Both a filesytem & a volume manager ○ Mirror write to 2 SSDs -> 2x read speed ● Works at the Linux kernel-space ○ Works with RAM to accelerate read/write ○ Auto promote/demote blocks from RAM to other storage ○ Used with local NVMe SSD if data is not in RAM ○ Acts as a local tiered-storage layer ● Extremely reliable ○ Automatic block checksum & repair
8 .ZFS + NVMe: Micro benchmark I3.4xlarge, up to 10Gbit network, 2 x 1.9 NVMe SSD ● Baseline w/ EBS ○ 135 MB/s write (dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync) ○ 157 MB/s read (dd if=/tmp/test1.img of=/dev/zero bs=8k) ● ZFS + 2 mirrored NVMe SSD ○ 820 MB/s write (dd if=/dev/zero of=/alluxio/fs/test1.img bs=1G count=1) ○ 1.7 GB/s read (dd if=/alluxio/fs/test1.img of=/dev/zero bs=1G count=1) ● 4x write, 10x read compared to EBS ● 10-15x compared to S3
9 .With ZFS Native/Hadoop Filesystem API Alluxio User-space Kernel-space ZFS Hot RAM promote demote Warm NVMe SSD
10 .With Hive Hive Metastore Last 30 Cold S3 > 30 days days Hot & Alluxio Warm
11 .CPU/IO Monitoring
12 .Tiered storage Monitoring
13 .Alluxio Monitoring
14 .Hive Monitoring & Performance Scanning 5G of data in tiered storage, 350M rows, fewer projections Scanning 200G of data in tiered storage, 500M rows, select *
15 . Scanning 35G of data in S3, 1.6B rows, count Metadata/split calculation ops distinct 60s, majority of the time spent on scanning S3
16 .Result ● 5-10X read improvement in Hive ○ Worker can short-circuit and read directly from ZFS instead of S3 ○ Move compute to the data ● Easy to debug, with feedback loop, collaborative ○ Data publishers + data analysts/scientists ● Good for iterating over the same data set multiple times ○ Machine learning ○ Exploratory analysis ● Give us control over S3 ○ More recent data should be faster to access
17 .Question?