- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
20190622 Alluxio在京东的最佳实践
展开查看详情
1 . The Practice of Alluxio in JD.COM
2 .毛 宝龙 ,京东 大数据存储相关负责人,主要 负责 JDHDFS 、 JDAlluxio 、 JDCeph 、 JDJDK 、 JDKernel 等组件。 主导构建京东万台规模大数据分布式文件存储。 热爱开源 ,并积极投入开源社区 。 Alluxio PMC Hadoop contributor 。 About me
3 .Contents A short Introduction Alluxio introduction 01 Introduce how to build when you modify your Alluxio or hadoop Build your Alluxio on your hadoop 02 Cache the job container log Using Alluxio accelerate JobHistory 03 10x performance improvement Using Alluxio accelerate JDPresto 04 some of the features contributed by JD JD Contribution 05 Expectation of Alluxio & Future plan Alluxio Future 06
4 .Alluxio introduction 1
5 .It is the world’s first virtual distributed storage system . Alluxio unifies data at memory-speed. Virtual Data Lake Apps only talk to Alluxio Simple Add/Remove No App Changes Highest performance in Memory Apps only talk to Alluxio Simple Add/Remove No App Changes Highest performance in Memory Apps only talk to Alluxio Simple Add/Remove No App Changes Highest performance in Memory Apps only talk to Alluxio Simple Add/Remove No App Changes Highest performance in Memory Apps only talk to Alluxio Simple Add/Remove No App Changes Highest performance in Memory What is Alluxio
6 .Application interface Apache Spark、 Presto 、Tensorflow Apache Hbase Apache Hive or Apache Flink Storage interface Amazon S3 、Google Cloud Storage、OpenStack Swift GlusterFS、 HDFS (Various version ) IBM Cleversafe、EMC ECS Ceph 、NFS 和 Alibaba OSS Alluxio is a bridge
7 .Powered by alluxio https ://www.alluxio.io/powered-by-alluxio/ Today, Alluxio is deployed in production by hundreds of organizations with the largest deployment exceeding 1,500 nodes.
8 .Alluxio is one of the fastest growing open source projects that has attracted more than 1000 contributors from over 300 institutions including Alibaba , Alluxio , Baidu , JD.COM , CMU , Google , IBM , Intel , NJU , Red Hat , Tencent , UC Berkeley , and Yahoo . Active Open Source Comunity
9 .Build your Alluxio on your hadoop 2
10 .Why build? How to build? XX Alluxio or XX Hadoop mvn install - Pdist,native - DskipTests =true - Dmaven.javadoc.skip =true - Drequire.snappy - Dsnappy.prefix =/data0/snappy/ - Dcontainer-executor.conf.dir =/ etc /yarn-executor/ - Dtar mvn -T 4C clean install -Phadoop-2 - Dhadoop.version =2.7.1 - DskipTests - Dlicense.skip =true - Dfindbugs.skip - Dmaven.javadoc.skip - Dcheckstyle.skip ; dev/scripts/generate- tarballs - ufs -modules=all release
11 .Using Alluxio accelerate JobHistory 3
12 .Mount job container logs HDFS URL to Alluxio fs $ alluxio fs mount --option alluxio.underfs.hdfs.configuration =/ xxx /servers/alluxio-2.0.0/ hadoop-conf / cluster1 / hadoop / hdfs-site.xml --option alluxio.underfs.version =2.7 / tmp /app-logs/ hdfs ://ns19/ tmp /app-logs/ Put alluxio client package into the jobhistory classpath . cp alluxio-core-client-hdfs-2.0.0-SNAPSHOT.jar hadoop-2.7.1/share/ hadoop / hdfs / How to let JobHistory use Alluxio
13 .Config Jobhistory Hdfs-site.xml < property> < name> fs.alluxio.impl </name> <value> alluxio.hadoop.FileSystem </value> </ property> < property> <name> fs.alluxio-ft.impl </name> < value> alluxio.hadoop.FaultTolerantFileSystem </value > </ property> <property> < name> fs.AbstractFileSystem.alluxio.impl </name> <value> alluxio.hadoop.AlluxioFileSystem </value> </ property > How to let JobHistory use Alluxio yarn- site.xml <property> <name> yarn.nodemanager.remote -app-log- dir </name> < value> alluxio :// hostname :19998/ tmp /app-logs </value > </ property>
14 .JobHistory using Alluxio show
15 .JobHistory using Alluxio show
16 .Using Alluxio accelerate JDPresto 4
17 .Presto
18 .Higher query throughput Consistent low query latency Eliminates network traffic Presto + Alluxio = better together
19 .Alluxio led to 10x performance improvement 100+ nodes More than 2.5 year. JDPresto on Alluxio advantage When we use Alluxio for JDPresto , we make some changes and bring some good features P luggable F ault-tolerant Locality Alluxio can be online or updated at any time When Alluxio unable to access , JDPresto can access HDFS directly. Reduce the remote read Presto on Alluxio
20 .Locality Isolation l oad once use every time ≈ ç After Before Presto on Alluxio
21 .Presto HDFS Alluxio Access Alluxio exception Access HDFS directly Read HDFS Data Cache to Alluxio Read Alluxio Presto on Alluxio
22 .Presto on Alluxio
23 .Presto on Alluxio
24 .Speed C ontrast Presto on Alluxio
25 .JD Contribution 5
26 .Review Alluxio Architecture
27 .Watermark Evict Strategy Sync Evit Strategy Async Evit Strategy
28 .Alluxio Cache Consistency(1)
29 .Alluxio Cache Consistency(2) Keep Alluxio & HDFS Consistency To ensure that dirty data is not read. There are three ways to trigger file consistency check . RPC API RESTful API Alluxio Master startup Client request metadata by getFileId , getFileInfo , listStatus , etc Alluxio master will check file cache consistency calling reloadMetaData to trigger Alluxio to reload all metadata check file cache consistency while master start up