Deep Learning 用逻辑回归训练图片的典型步骤.
笔记摘自:https://xienaoban.github.io/posts/59595.html
处理数据
向量化(Vectorization)
将每张图片的高和宽和RGB展为向量,最终X的shape为 (height*width*3, m)
.
特征归一化(Normalization)
对于一般数据,使用标准化(Standardization):\(X_{scale} = \frac{(X(axis=0) - X.mean(axis=0))}{X.std(axis=0)}\)
z_i = (x_i - mean) / delta
,mean
与delta
代表X的均值和标准差. 最终特征处于[-1, 1]区间.
对于图片, 可直接使用Min-Max Scaling
- 即将每个特征除以255(每个像素分为R, G, B, 范围在0~255)使得值处于[0, 1].
初始化参数
一般将 w
和 b
随机选择.
梯度下降(Gradient descent)
根据 w
, b
和训练集,来训练数据.
- 需要设定 迭代次数 与 学习率 .
以下为大循环(迭代次数)中内容:
计算代价函数
对于\(x^{(i)} \in X\), 有 \[ z^{(i)} = w^Tx^{(i)} + b \]
\[ a^{(i)} = \hat{y}^{(i)} = sigmod(z^{(i)}) = \sigma(z^{(i)}) = \frac{1}{1 + e^{-z^{(i)}}} \]
\[ loss: {L}(a^{(i)}, y^{(i)}) = {L}(\hat{y}^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)}) \]
\[ A = (a^{(1)}, a^{(2)}, ... , a^{(m-1)}, a^{(m)}) = \sigma(w^TX+b) = \frac{1}{1+e^{-(w^TX+b)}} \]
\[ cost:J(w,b) = -\frac{1}{m} \sum^{m}_{i=1} \mathcal{L}(\hat{y}^{(i)}, y^{(i)}) = -\frac{1}{m} \sum^{m}_{i=1} (y^{(i)} log(a^{(i)}) + (1-y^{(i)}) log(1-a^{(i)})) \]
1 | # 激活函数 |
计算反向传播的梯度
即:对 \(J = -\dfrac{1}{m} \sum L(a, y)\) 计算导数,即对\({L}(a, y)\) 计算导数,以下求导,均省略上标。
求:\(\dfrac{\partial J}{\partial w}\) 和 \(\dfrac{\partial J}{\partial b} \) (dw 和 db)
\[ \dfrac{\partial L}{\partial a} = \dfrac{\partial L(a, y)}{\partial a} = -\frac{y}{a} + \frac{1-y}{1-a} \]
\[ \dfrac{da}{dz} = (\frac{1}{1 + e^{-z}})' = \dfrac{e^{-z}}{(1+e^{-z})^2} = \dfrac{1}{1+e^{-z}} - \dfrac{1}{(1+e^{-z})^2} = a-a^2 = a · (1-a) \]
\[ \dfrac{\partial L}{\partial z} = \dfrac{\partial L}{\partial a} \dfrac{da}{dz} = (-\dfrac{y}{a} + \dfrac{1-y}{1-a}) · a · (1-a) = a - y \]
\[ \dfrac{\partial L}{\partial w} = \dfrac{\partial L}{\partial z} \dfrac{\partial z}{\partial w} = (a-y) · x \]
\[ \dfrac{\partial L}{\partial b} = \dfrac{\partial L}{\partial z} \dfrac{\partial z}{\partial b} = a-y \]
根据 \(J = -\dfrac{1}{m} \sum L(a, y)\) 最终可得: \[ \dfrac{\partial J}{\partial w} = \dfrac{\partial J}{\partial a} \dfrac{\partial a}{\partial w} = \dfrac{1}{m} X(A-Y)^T \]
\[ \dfrac{\partial J}{\partial b} = \dfrac{1}{m} \sum^{m}_{i=1} (a^{(i)} - y^{(i)}) \]
1 | dw = X.dot((A - Y).T) / m |
更新 w
, b
1 | w = w - learning_rate * dw |
预测测试集
使用训练出来的
w
,b
, 对测试集使用y_pred = sigmoid(wx+b)
, 计算得预测的概率对其取整, 例如大于0.7则判定为 '是', 否则为'否'.