1. Harbor开源项目 容器镜像远程复制的实现 Henry Zhang (张海宁) Chief Architect VMWare China

2.自我介绍 • VMware中国研发首席架构师 • Harbor开源企业级容器Registry项目创始人 • Cloud Foundry中国社区最早技术布道师之一 • 多年全栈工程师 • 《区块链技术指南》、《软件定义存储》作者之一 亨利笔记 《区块链技术指南》 《软件定义存储》

3.Introducing Project Harbor • An open source enterprise-class registry server. (launched Mar 2016) • Initiated by VMware China • Apache 2 license • https://github.com/vmware/harbor/ 3

4.Project Harbor and Golang • Harbor uses and grows with Go language from Day 1 • Go v1.3-1.7 • Beego: v1.3-1.6 • A member project of Golang Foundation 4

5.Harbor Users and Partners 2000 10K+ 200+ + Downloads Stars Users 500+ 46 6 Forks Contributors Partners 5

6.Harbor Contributors Worldwide 6

7.Harbor Adoption 7

8.Key Users and Partners • Users • Partners 8

9.Harbor used in Production and Dev In what environment Harbor is used? (%) 60 50 40 30 20 10 0 Dev and Production Dev Production Still evaluating Survey based on Chinese user community, 53 responses

10.Do you recommend Harbor? Do you recommend Harbor to others? (%) 100 90 80 70 60 50 40 30 20 10 0 Yes Unsure No Survey based on Chinese user community, 53 responses

11.Docker Container Lifecycle: Build-Ship-Run

12.Build-Ship-Run through Registry Cloud • Registry is a key component of devops

13.Harbor : Enterprise-Class Private Registry Why does one need a private registry? • Efficiency • LAN vs WAN • Security • Intellectual property stays in organization • Access Control 13

14.Enterprise Oriented Features • User management & access control • RBAC: admin, developer, guest • AD/LDAP integration • Policy based image replication • Web UI ( 中文 and English) • Audit and logs • Restful API for integration • Lightweight and easy deployment 14

15.Project Harbor - Microservices Architecture Harbor Remote Basic Registry Harbor (Docker instance Docker Distribution) Client Revers UI e Proxy Replication Browser (Nginx) Admin API Service Server Auth AD / Log LDAP Collector DB (rsyslog) (MySQL) 15

16.Image Replication between Registry Instances Project Project Policy Images Images Initial replication Image Image Incremental replication (including image deletion) 16

17. Image Replication Use Case(1) • Image distribution for large cluster Master – Slave • Load balancing push Docker Client pull Docker Docker Docker Docker Docker Docker Docker host host host host host host host 17

18.Image Replication Use Case(2) • Remote image synchronization • Geographically distributed teams • On prem to public cloud Docker Docker • Back up Client Client Master – Master Docker Docker Docker Docker host host host host 18

19.Shipping (Publishing) Images via Replication Git images images images CI Test Registry Dev Registry images images images Staging Production Registry Registry

20.Requirements of Image Replication • Asynchronous replication (background job) • Little impact to registry service (throttle) • Reliable and auto retry failed operations (recovery) • Manual intervention (admin interaction)

21.Producer and Consumer Pattern • Front end (UI) or registry generates replication jobs (producer) • Backend workers handle replication (consumer) • Potential issues • Producers need to sleep or wait when buffer is full • Sleep or wait is not suitable for front end / registry Worker 1 Front End Worker 2 Registry Job queue Worker3 Producer Consumer

22.Modified Producer and Consumer Pattern • Non blocking for producers • Dispatcher queues jobs • Dispatcher distributes jobs to available workers • Workers added back to available worker queue after jobs are completed Available worker queue Dispatcher Front End dispatch Worker Registry Job queue Producer Consumer

23.Goroutine as Lightweight Thread • Simple syntax • go f(x,y,z) • Concurrency ( asynchronousness ) • Shared the same address space • Non blocking for main flow • Ideal for background replication

24.Channel for Communication between Threads • Syntax • No buffering: make(chan Type) • With buffering: make(chan Type, capacity) • Send: ch <- v • Receive: v:= <- ch • Used to block or unblock threads • Dispatcher thread ( producer) • Worker thread (consumer) • Also used for stopping a job

25. Worker Pool • Predefine a pool of available workers (default:3, not to overwhelm frontend tasks) • A list of workers and a channel for dispatching job harbor/src/jobservice/job/workerpool.go

26.Worker • A channel to receive replication job • Another channel to receive special instruction, such as quitting harbor/src/jobservice/job/workerpool.go

27.Workers Wait for Replication Job • Channel w.RepJobs blocked until a job is dispatched

28.Dispatcher • Receives job and distributes to available worker • Channel WorkerPool.workerChan is blocked if no worker is available • harbor/src/jobservice/job/workerpool.go

29.Replication Job initializing • Replicating an image itself seems not THAT hard checking • However …. transferring pulling blobs manifest Yes pushing has manifest tags No finished