Backpropagation

Forward pass:Compute
展开查看详情

1.Backpropagation Hung-yi Lee 李宏毅

2.Gradient Descent Network parameters   w1 , w2 ,, b1 , b2 , Starting Parameters  0 1 2 …… L    Compute L  0  1   0  L 0   L  w1  L  w  Compute L 1  2   1  L 1   2     L  b1  Millions of parameters ……    L  b2  To compute the gradients efficiently,      we use backpropagation.

3.Chain Rule Case 1 y  g x  z  h y  dz dz dy x  y  z  dx dy dx Case 2 x  g s  y  hs  z  k  x, y  x s dz z dx z dy z   ds x ds y ds y

4. NN Backpropagation xn 𝜃 yn 𝑙𝑛 𝑦ො 𝑛 𝑁 𝑁 𝜕𝐿 𝜃 𝜕𝑙 𝑛 𝜃 𝐿 𝜃 = ෍ 𝑙𝑛 𝜃 =෍ 𝜕𝑤 𝜕𝑤 𝑛=1 𝑛=1 𝑦1 𝑥1 𝑥2 𝑦2

5.Backpropagation 𝑤1 𝑧 𝑦1 𝑥1  …… 𝑤2 b 𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑏 𝑥2 𝑦2 Forward pass: 𝜕𝑙 𝜕𝑧 𝜕𝑙 Compute 𝜕𝑧Τ𝜕𝑤 for all parameters =? 𝜕𝑤 𝜕𝑤 𝜕𝑧 Backward pass: (Chain rule) Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z

6.Backpropagation – Forward pass Compute 𝜕𝑧Τ𝜕𝑤 for all parameters 𝑤1 𝑧 𝑦1 𝑥1  …… 𝑤2 b 𝑧 = 𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑏 𝑥2 𝑦2 𝜕𝑧Τ𝜕𝑤1 =? 𝑥1 The value of the input 𝜕𝑧Τ𝜕𝑤2 =? 𝑥2 connected by the weight

7.Backpropagation – Forward pass Compute 𝜕𝑧Τ𝜕𝑤 for all parameters 1 0.98 2 0.86 3 1 -2 -1 -1 1 0 -2 -1 0.12 -2 0.11 -1 -1 1 -1 4 0 0 2 𝜕𝑧 𝜕𝑧 𝜕𝑧 = −1 = 0.12 = 0.11 𝜕𝑤 𝜕𝑤 𝜕𝑤

8. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z 𝑤1 𝑧 𝑎 𝑥1  𝑎=𝜎 𝑧 𝑤2 b 𝑥2 𝜎 𝑧 𝜕𝑙 𝜕𝑎 𝜕𝑙 = 𝜕𝑧 𝜕𝑧 𝜕𝑎 𝜎′ 𝑧 𝜎′ 𝑧

9. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z 𝑤1 𝑧 𝑎 𝑤3 𝑧′ 𝑥1   𝑎=𝜎 𝑧 𝑧′ = 𝑎𝑤3 + ⋯ 𝑤2 b 𝑤4 𝑧’’ 𝑥2  𝜕𝑙 𝜕𝑎 𝜕𝑙 𝜕𝑙 𝜕𝑧′ 𝜕𝑙 𝜕𝑧′′ 𝜕𝑙 = = + (Chain rule) 𝜕𝑧 𝜕𝑧 𝜕𝑎 𝜕𝑎 𝜕𝑎 𝜕𝑧′ 𝜕𝑎 𝜕𝑧′′ ? ? Assumed 𝑤 3 𝑤 4 it’s known

10. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z 𝑤1 𝑧 𝑎 𝑤3 𝑧′ 𝑥1   𝜕𝑙 𝜕𝑙 b 𝜕𝑧 𝜕𝑧′ 𝑤2 𝑤4 𝑧’’ 𝑥2  𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑧′′ = 𝜎′ 𝑧 𝑤3 + 𝑤4 𝜕𝑧 𝜕𝑧′ 𝜕𝑧′′

11.Backpropagation – Backward pass 𝜎′ 𝑧 𝑤3  𝜕𝑙 𝜕𝑙 𝜕𝑧 𝜕𝑧′ 𝑤4 𝜎′ 𝑧 is a constant because z is  already determined in the forward pass. 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑧′′ = 𝜎′ 𝑧 𝑤3 + 𝑤4 𝜕𝑧 𝜕𝑧′ 𝜕𝑧′′

12. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z 𝑤1 𝑧 𝑎 𝑤3 𝑧′ 𝑥1   𝑦1 𝜕𝑙 𝜕𝑙 b 𝜕𝑧 𝜕𝑧′ 𝑤2 𝑤4 𝑧’’ 𝑦2 𝑥2  𝜕𝑙 Case 1. Output Layer 𝜕𝑧′′ 𝜕𝑙 𝜕𝑦1 𝜕𝑙 𝜕𝑙 𝜕𝑦2 𝜕𝑙 = = Done! 𝜕𝑧′ 𝜕𝑧′ 𝜕𝑦1 𝜕𝑧′′ 𝜕𝑧′′ 𝜕𝑦2

13. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer 𝑧′  …… 𝜕𝑙 𝜕𝑧′ 𝑧’’  …… 𝜕𝑙 𝜕𝑧′′

14. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer 𝑧′ 𝑎′ 𝑤5 𝑧𝑎   𝜕𝑙 𝜕𝑙 𝜕𝑧′ 𝜕𝑧𝑎 𝑤6 𝑧’’ 𝑧𝑏   𝜕𝑙 𝜕𝑙 𝜕𝑧′′ 𝜕𝑧𝑏

15. Backpropagation – Backward pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer Compute 𝜕𝑙 Τ𝜕𝑧 recursively 𝑧′ 𝑎′ 𝑤5 𝑧𝑎   𝜕𝑙 𝜕𝑙 𝜕𝑧′ 𝜎′ 𝑧′ Until we reach the 𝜕𝑧𝑎 𝑤6 output layer …… 𝑧’’ 𝑧𝑏   𝜕𝑙 𝜕𝑙 𝜕𝑧′′ 𝜕𝑧𝑏

16.Backpropagation – Backward Pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z Compute 𝜕𝑙 Τ𝜕𝑧 from the output layer 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑧1 𝜕𝑧3 𝜕𝑧5 𝑧1 𝑧3 𝑧5 𝑥1 𝑦1 𝑥2 𝑦2 𝑧2 𝑧4 𝑧6 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑧2 𝜕𝑧4 𝜕𝑧6

17.Backpropagation – Backward Pass Compute 𝜕𝑙 Τ𝜕𝑧 for all activation function inputs z Compute 𝜕𝑙 Τ𝜕𝑧 from the output layer 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑧1 𝜕𝑧3 𝜕𝑧5 𝑧1 𝑧3 𝑧5 𝑥1 𝑦1 𝜎′ 𝑧1 𝜎′ 𝑧3 𝜎′ 𝑧2 𝜎′ 𝑧4 𝑥2 𝑦2 𝑧2 𝑧4 𝑧6 𝜕𝑙 𝜕𝑙 𝜕𝑙 𝜕𝑧2 𝜕𝑧4 𝜕𝑧6

18.Backpropagation – Summary Forward Pass Backward Pass … … … 𝑎 … 𝜕𝑧 𝜕𝑙 𝜕𝑙 =𝑎 X = 𝜕𝑤 𝜕𝑧 𝜕𝑤 for all w