申请试用
HOT
登录
注册
 
账号已存在
Nvidia最新更新及开源项目简介
openLooKeng
/
发布于
/
44
人观看

3赖俊杰.png
赖俊杰-NVIDIA(英伟达)中国区工程和解决方案高级总监

展开查看详情

1 .

2 .NVIDIA BRIEF AND OPEN SOURCE AT NVIDIA JULIEN LAI

3 .

4 . NVIDIA - A COMPUTING PLATFORM COMPANY NVIDIA pioneered accelerated computing to tackle challenges ordinary computers cannot. We make computers for the da Vincis and Einsteins of our time so that they can see and create the future.

5 . HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions Rest of Sequential CPU Code GPU CPU + 5

6 .NVIDIA OMNIVERSE IS THE PLATFORM OF REAL-TIME COLLABORATION & SIMULATION AI Path-Tracing USD Materials Physics

7 .DIGITAL JENSEN 7

8 . WORKFLOW Digital Jensen Audio Retargeting Rendering Refreshed 3D Talking 3D Talking Talking Jensen Face Audio2Face OV Talking Jensen Mark Head Mark Head Jensen Head Head Video Vid2Vid Head Video 3D Model 3D scan Jensen Head Jensen Real 3D Model Image Jensen Body 3D Model Merge Rendering Audio Skeleton 3D Jensen Body Talking Jensen Talking Jensen Audio2Gestures OV Video Animation Animation Animation Body Video Dataset 3D 2D 8

9 .

10 .OPEN SOURCE AT NVIDIA

11 .

12 . A few Open Source Projects NCCL Triton NeMo 12

13 .NCCL 13

14 .COLLECTIVE COMMUNICATION Multiple senders and/or receivers 14

15 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 15

16 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 16

17 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 Step 2: ∆𝑡 = 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 17

18 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 Step 2: ∆𝑡 = 𝑁/𝐵 Step 3: ∆𝑡 = 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 18

19 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 Step 2: ∆𝑡 = 𝑁/𝐵 Step 3: ∆𝑡 = 𝑁/𝐵 Total time: 𝑘 − 1 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 𝑘: number of GPUs 19

20 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 20

21 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) 21

22 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) 22

23 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) Step 3: ∆𝑡 = 𝑁/(𝑆𝐵) 23

24 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) Step 3: ∆𝑡 = 𝑁/(𝑆𝐵) Step 4: ∆𝑡 = 𝑁/(𝑆𝐵) 24

25 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) Step 3: ∆𝑡 = 𝑁/(𝑆𝐵) Step 4: ∆𝑡 = 𝑁/(𝑆𝐵) ... Total time: S𝑁/(𝑆𝐵) + (𝑘 − 2) 𝑁Τ(𝑆𝐵) = 𝑁(𝑆 + 𝑘 − 2)/(𝑆𝐵) → 𝑁/𝐵 25

26 .ALL-REDUCE Chunk: 1 Step: 0 with unidirectional ring GPU0 GPU1 GPU2 GPU3 26

27 .ALL-REDUCE Chunk: 1 Step: 1 with unidirectional ring GPU0 GPU1 GPU2 GPU3 27

28 .ALL-REDUCE Chunk: 1 Step: 2 with unidirectional ring GPU0 GPU1 GPU2 GPU3 28

29 .ALL-REDUCE Chunk: 1 Step: 3 with unidirectional ring GPU0 GPU1 GPU2 GPU3 29

0 点赞
0 收藏
3下载
确认
3秒后跳转登录页面
去登陆