Convolutional Neural Networks(CNN) #4 卷積核的Back propagation

Convolutional Neural Networks卷積神經網路的參數最終仍會以倒傳遞演算法(Backpropagation)優化,本文將由淺而深的從原理到可程式化的計算方法向各位介紹Kernel的偏微分計算方法。而本文採用的範例是包含padding的卷積層,這種設定也將更趨近於現實,希望透過這種方式讓各位完全理解。

本篇內容中卷積神經網路的功能,同步存放於Github:https://github.com/PoLun-Wang/DL_practice_convolutional_neural_networks
此外,想要了解卷積神經網路的基本運算過程可以參考這3篇文章:CNN #1(Kernel, Stride, Padding)CNN #2(Pooling layer)CNN #3(計算參數量)

本文所使用的範例是1層卷積層來講解與推導Backpropagation的過程,如下圖(1)。

表(1):符號對照表

符號 說明
$x$ 非第0層的Feature map。(第0層指的就是input images)
$w$ 卷積核,形狀為$(2\times 2)$。
$z$  為$x$、$w$經過卷積運算出來的結果,形狀為$(4\times 4)$。
$L(\cdot)$  指的是Loss function,導傳遞法就是要計算參數對於Loss function的影響力。

CNN 卷積神經網路 backward pass 反向傳遞圖(1):推導用簡易架構圖

由於Backpropagation的反向傳遞階段(Backward pass)會使用到各層神經網路的輸出計算偏微分,藉此來反推該層神經網路的參數對於Loss function的偏微分量。不太了解導傳遞法原理的話可以來讀一讀這一篇文章:Backpropatation
先列出每一個$z$的組成,以此釐清對$x$和$w$進行偏微分的時候該怎麼推導。

下列分析$z$的組成是很重要的過程,推導$\frac{\partial L}{\partial w}$、$\frac{\partial L}{\partial x}$都會用到。
$z_{11}=0w_{11}+0w_{12}+0w_{21}+x_{11}w_{22}$
$z_{12}=0w_{11}+0w_{12}+x_{11}w_{21}+x_{12}w_{22}$
$z_{13}=0w_{11}+0w_{12}+x_{12}w_{21}+x_{13}w_{22}$
$z_{14}=0w_{11}+0w_{12}+x_{13}w_{21}+0w_{22}$
– – – – –
$z_{21}=0w_{11}+x_{11}w_{12}+0w_{21}+x_{21}w_{22}$
$z_{22}=x_{11}w_{11}+x_{12}w_{12}+x_{21}w_{21}+x_{22}w_{22}$
$z_{23}=x_{12}w_{11}+x_{13}w_{12}+x_{22}w_{21}+x_{23}w_{22}$
$z_{24}=x_{13}w_{11}+0w_{12}+x_{23}w_{21}+0w_{22}$
– – – – –
$z_{31}=0w_{11}+x_{21}w_{12}+0w_{21}+x_{31}w_{22}$
$z_{32}=x_{21}w_{11}+x_{22}w_{12}+x_{31}w_{21}+x_{32}w_{22}$
$z_{33}=x_{22}w_{11}+x_{23}w_{12}+x_{32}w_{21}+x_{33}w_{22}$
$z_{34}=x_{23}w_{11}+0w_{12}+x_{33}w_{21}+0w_{22}$
– – – – –
$z_{41}=0w_{11}+x_{31}w_{12}+0w_{21}+0w_{22}$
$z_{42}=x_{31}w_{11}+x_{32}w_{12}+0w_{21}+0w_{22}$
$z_{43}=x_{32}w_{11}+x_{33}w_{12}+0w_{21}+0w_{22}$
$z_{44}=x_{33}w_{11}+0w_{12}+0w_{21}+0w_{22}$

CNN 卷積神經網路 backward pass backpropagation圖(2):從$\frac{\partial L}{\partial z}$反向傳遞至卷積核($w$)與前一層的Feature map($x$)

 

卷積核導數的計算原理$\frac{\partial L}{\partial w}$

這需要透過Chain rule來達成我們的目的,不是很了解的話建議來看看這篇文章Backpropatation
前面已經將每一個$z$的組成都分析過了,現在我們可以用傳統的方法來推導$\frac{\partial L}{\partial w_{11}}$。

Step-1

首先,從剛才用來分析$z$的式子中找出$w_{11}$的係數不為$0$的$z$:$z_{22},z_{23},z_{24},z_{32},z_{33},z_{34},z_{42},z_{43},z_{44}$

$z_{11}=0w_{11}+0w_{12}+0w_{21}+x_{11}w_{22}$
$z_{12}=0w_{11}+0w_{12}+x_{11}w_{21}+x_{12}w_{22}$
$z_{13}=0w_{11}+0w_{12}+x_{12}w_{21}+x_{13}w_{22}$
$z_{14}=0w_{11}+0w_{12}+x_{13}w_{21}+0w_{22}$
– – – – –
$z_{21}=0w_{11}+x_{11}w_{12}+0w_{21}+x_{21}w_{22}$
$z_{22}=$$x_{11}$$w_{11}$$+x_{12}w_{12}+x_{21}w_{21}+x_{22}w_{22}$
$z_{23}=$$x_{12}$$w_{11}$$+x_{13}w_{12}+x_{22}w_{21}+x_{23}w_{22}$
$z_{24}=$$x_{13}$$w_{11}$$+0w_{12}+x_{23}w_{21}+0w_{22}$
– – – – –
$z_{31}=0w_{11}+x_{21}w_{12}+0w_{21}+x_{31}w_{22}$
$z_{32}=$$x_{21}$$w_{11}$$+x_{22}w_{12}+x_{31}w_{21}+x_{32}w_{22}$
$z_{33}=$$x_{22}$$w_{11}$$+x_{23}w_{12}+x_{32}w_{21}+x_{33}w_{22}$
$z_{34}=$$x_{23}$$w_{11}$$+0w_{12}+x_{33}w_{21}+0w_{22}$
– – – – –
$z_{41}=0w_{11}+x_{31}w_{12}+0w_{21}+0w_{22}$
$z_{42}=$$x_{31}$$w_{11}$$+x_{32}w_{12}+0w_{21}+0w_{22}$
$z_{43}=$$x_{32}$$w_{11}$$+x_{33}w_{12}+0w_{21}+0w_{22}$
$z_{44}=$$x_{33}$$w_{11}$$+0w_{12}+0w_{21}+0w_{22}$

Step-2

透過連鎖率計算個別的$\frac{\partial L}{\partial z}\frac{\partial z}{\partial w}$,最後將這些偏微分值加總,就會得到$\frac{\partial L}{\partial w}$。剩下的$w_{12}$、$w_{21}$、$w_{22}$依照Step-1、Step-2的方法就可以算出卷積核對$L$的影響力囉。

$\begin{align*}
\frac{\partial L}{\partial w_{11}}=\frac{\partial L}{\partial z_{22}}\frac{\partial z_{22}}{\partial w_{11}}+\frac{\partial L}{\partial z_{23}}\frac{\partial z_{23}}{\partial w_{11}}+\frac{\partial L}{\partial z_{24}}\frac{\partial z_{24}}{\partial w_{11}} \\
+\frac{\partial L}{\partial z_{32}}\frac{\partial z_{32}}{\partial w_{11}}+\frac{\partial L}{\partial z_{33}}\frac{\partial z_{33}}{\partial w_{11}}+\frac{\partial L}{\partial z_{34}}\frac{\partial z_{34}}{\partial w_{11}}\\
+\frac{\partial L}{\partial z_{42}}\frac{\partial z_{42}}{\partial w_{11}}+\frac{\partial L}{\partial z_{43}}\frac{\partial z_{43}}{\partial w_{11}}+\frac{\partial L}{\partial z_{44}}\frac{\partial z_{44}}{\partial w_{11}}\\
=\frac{\partial L}{\partial z_{22}}{x_{11}}+\frac{\partial L}{\partial z_{23}}x_{12}+\frac{\partial L}{\partial z_{24}}x_{13}\\
+\frac{\partial L}{\partial z_{32}}x_{21}+\frac{\partial L}{\partial z_{33}}x_{22}+\frac{\partial L}{\partial z_{34}}x_{23}\\
+\frac{\partial L}{\partial z_{42}}x_{31}+\frac{\partial L}{\partial z_{43}}x_{32}+\frac{\partial L}{\partial z_{44}}x_{33}
\end{align*}$

$\begin{align*}
\frac{\partial L}{\partial w_{12}}=\frac{\partial L}{\partial z_{21}}x_{11}+\frac{\partial L}{\partial z_{22}}x_{12}+\frac{\partial L}{\partial z_{23}}x_{13}\\
+\frac{\partial L}{\partial z_{31}}x_{21}+\frac{\partial L}{\partial z_{32}}x_{22}+\frac{\partial L}{\partial z_{33}}x_{23}\\
+\frac{\partial L}{\partial z_{41}}x_{31}+\frac{\partial L}{\partial z_{42}}x_{32}+\frac{\partial L}{\partial z_{43}}x_{33}
\end{align*}$

$\begin{align*}
\frac{\partial L}{\partial w_{21}}=\frac{\partial L}{\partial z_{12}}x_{11}+\frac{\partial L}{\partial z_{13}}x_{12}+\frac{\partial L}{\partial z_{14}}x_{13}\\
+\frac{\partial L}{\partial z_{22}}x_{21}+\frac{\partial L}{\partial z_{23}}x_{22}+\frac{\partial L}{\partial z_{24}}x_{23}\\
+\frac{\partial L}{\partial z_{32}}x_{31}+\frac{\partial L}{\partial z_{33}}x_{32}+\frac{\partial L}{\partial z_{34}}x_{33}
\end{align*}$

$\begin{align*}
\frac{\partial L}{\partial w_{22}}=\frac{\partial L}{\partial z_{11}}x_{11}+\frac{\partial L}{\partial z_{12}}x_{12}+\frac{\partial L}{\partial z_{13}}x_{13}\\
+\frac{\partial L}{\partial z_{21}}x_{21}+\frac{\partial L}{\partial z_{22}}x_{22}+\frac{\partial L}{\partial z_{23}}x_{23}\\
+\frac{\partial L}{\partial z_{31}}x_{31}+\frac{\partial L}{\partial z_{32}}x_{32}+\frac{\partial L}{\partial z_{33}}x_{33}
\end{align*}$

看到這裡,有沒有覺得很複雜且不易程式化?但這就是基礎原理,希望各位可以了解。
接下來要說明的就更快、更容易程式化的做法。

更好的計算方法

這個方法是從Coursera的課程學習到的,此方法套用至具有padding的卷積層也能完美解決卷積核在Back propagation過程的程式化問題!

我們先用文字描述此做法,再用下方的圖示幫助理解:
以圖(3)來說,先將上一層的Feature map加入padding,再以Kernel的大小形成一個Sliding window對已加入padding的上層Feature map框選矩陣(e.g. $\begin{bmatrix} 0 & 0\\ 0 & x_{11} \end{bmatrix}$),將框選到的矩陣乘上$\frac{\partial L}{\partial z_{11}}$,先把結果暫存起來。接著將Sliding window依據stride設定移動,重複上述操作並將乘積不斷加總,如圖(4~6)所示。

因為是Sliding window矩陣乘上一個純量,所以結果也會是一個與Kernel形狀一致的矩陣,加總後的結果就會是$\begin{bmatrix} \frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{12}}\\ \frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{22}} \end{bmatrix}$。

CNN Tutorial back propagation for kernels of CNN 1圖(3):計算卷積核偏微分量1

 

CNN Tutorial back propagation for kernels of CNN 2圖(4):計算卷積核偏微分量2

 

CNN Tutorial back propagation for kernels of CNN 3圖(5):計算卷積核偏微分量3

 

CNN Tutorial back propagation for kernels of CNN 4圖(6):計算卷積核偏微分量4

最終,$\frac{\partial L}{\partial w}$的結果應該是:

$\begin{align*}
\frac{\partial L}{\partial w_{11}}=\frac{\partial L}{\partial z_{22}}x_{11}+\frac{\partial L}{\partial z_{23}}x_{12}+\frac{\partial L}{\partial z_{24}}x_{13}\\
+\frac{\partial L}{\partial z_{32}}x_{12}+\frac{\partial L}{\partial z_{33}}x_{22}+\frac{\partial L}{\partial z_{34}}x_{23}\\
+\frac{\partial L}{\partial z_{42}}x_{31}+\frac{\partial L}{\partial z_{43}}x_{32}+\frac{\partial L}{\partial z_{44}}x_{33}
\end{align*}$

$\begin{align*}
\frac{\partial L}{\partial w_{12}}=\frac{\partial L}{\partial z_{21}}x_{11}+\frac{\partial L}{\partial z_{22}}x_{12}+\frac{\partial L}{\partial z_{23}}x_{13}+\\
+\frac{\partial L}{\partial z_{31}}x_{21}+\frac{\partial L}{\partial z_{32}}x_{22}+\frac{\partial L}{\partial z_{33}}x_{23}\\
+\frac{\partial L}{\partial z_{41}}x_{31}+\frac{\partial L}{\partial z_{42}}x_{32}+\frac{\partial L}{\partial z_{43}}x_{33}
\end{align*}$

$\begin{align*}
\frac{\partial L}{\partial w_{21}}=\frac{\partial L}{\partial z_{12}}x_{11}+\frac{\partial L}{\partial z_{13}}x_{12}+\frac{\partial L}{\partial z_{14}}x_{13}\\
+\frac{\partial L}{\partial z_{22}}x_{21}+\frac{\partial L}{\partial z_{23}}x_{22}+\frac{\partial L}{\partial z_{24}}x_{23}\\
+\frac{\partial L}{\partial z_{32}}x_{31}+\frac{\partial L}{\partial z_{33}}x_{32}+\frac{\partial L}{\partial z_{34}}x_{33}
\end{align*}$

$\begin{align*}
\frac{\partial L}{\partial w_{22}}=\frac{\partial L}{\partial z_{11}}x_{11}+\frac{\partial L}{\partial z_{12}}x_{12}+\frac{\partial L}{\partial z_{13}}x_{13}\\
+\frac{\partial L}{\partial z_{21}}x_{21}+\frac{\partial L}{\partial z_{22}}x_{22}+\frac{\partial L}{\partial z_{23}}x_{23}\\
+\frac{\partial L}{\partial z_{31}}x_{31}+\frac{\partial L}{\partial z_{32}}x_{32}+\frac{\partial L}{\partial z_{33}}x_{33}
\end{align*}$

Reference
  1. Andrew Ng – Convolution Neural Networks in Coursera
  2. (paper) A guide to convolution arithmetic for deep learning
  3. Back Propagation in Convolutional Neural Networks — Intuition and Code
  4. A Comprehensive Introduction to Different Types of Convolutions in Deep Learning

Andy Wang

站在巨人的肩膀上仍須戰戰兢兢!

發表迴響

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料