- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Nvidia最新更新及开源项目简介
赖俊杰-NVIDIA(英伟达)中国区工程和解决方案高级总监
展开查看详情
1 .
2 .NVIDIA BRIEF AND OPEN SOURCE AT NVIDIA JULIEN LAI
3 .
4 . NVIDIA - A COMPUTING PLATFORM COMPANY NVIDIA pioneered accelerated computing to tackle challenges ordinary computers cannot. We make computers for the da Vincis and Einsteins of our time so that they can see and create the future.
5 . HOW GPU ACCELERATION WORKS Application Code Compute-Intensive Functions Rest of Sequential CPU Code GPU CPU + 5
6 .NVIDIA OMNIVERSE IS THE PLATFORM OF REAL-TIME COLLABORATION & SIMULATION AI Path-Tracing USD Materials Physics
7 .DIGITAL JENSEN 7
8 . WORKFLOW Digital Jensen Audio Retargeting Rendering Refreshed 3D Talking 3D Talking Talking Jensen Face Audio2Face OV Talking Jensen Mark Head Mark Head Jensen Head Head Video Vid2Vid Head Video 3D Model 3D scan Jensen Head Jensen Real 3D Model Image Jensen Body 3D Model Merge Rendering Audio Skeleton 3D Jensen Body Talking Jensen Talking Jensen Audio2Gestures OV Video Animation Animation Animation Body Video Dataset 3D 2D 8
9 .
10 .OPEN SOURCE AT NVIDIA
11 .
12 . A few Open Source Projects NCCL Triton NeMo 12
13 .NCCL 13
14 .COLLECTIVE COMMUNICATION Multiple senders and/or receivers 14
15 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 15
16 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 16
17 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 Step 2: ∆𝑡 = 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 17
18 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 Step 2: ∆𝑡 = 𝑁/𝐵 Step 3: ∆𝑡 = 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 18
19 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Step 1: ∆𝑡 = 𝑁/𝐵 Step 2: ∆𝑡 = 𝑁/𝐵 Step 3: ∆𝑡 = 𝑁/𝐵 Total time: 𝑘 − 1 𝑁/𝐵 𝑁: bytes to broadcast 𝐵: bandwidth of each link 𝑘: number of GPUs 19
20 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 20
21 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) 21
22 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) 22
23 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) Step 3: ∆𝑡 = 𝑁/(𝑆𝐵) 23
24 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) Step 3: ∆𝑡 = 𝑁/(𝑆𝐵) Step 4: ∆𝑡 = 𝑁/(𝑆𝐵) 24
25 .BROADCAST with unidirectional ring GPU0 GPU1 GPU2 GPU3 Split data into 𝑆 messages Step 1: ∆𝑡 = 𝑁/(𝑆𝐵) Step 2: ∆𝑡 = 𝑁/(𝑆𝐵) Step 3: ∆𝑡 = 𝑁/(𝑆𝐵) Step 4: ∆𝑡 = 𝑁/(𝑆𝐵) ... Total time: S𝑁/(𝑆𝐵) + (𝑘 − 2) 𝑁Τ(𝑆𝐵) = 𝑁(𝑆 + 𝑘 − 2)/(𝑆𝐵) → 𝑁/𝐵 25
26 .ALL-REDUCE Chunk: 1 Step: 0 with unidirectional ring GPU0 GPU1 GPU2 GPU3 26
27 .ALL-REDUCE Chunk: 1 Step: 1 with unidirectional ring GPU0 GPU1 GPU2 GPU3 27
28 .ALL-REDUCE Chunk: 1 Step: 2 with unidirectional ring GPU0 GPU1 GPU2 GPU3 28
29 .ALL-REDUCE Chunk: 1 Step: 3 with unidirectional ring GPU0 GPU1 GPU2 GPU3 29