数据挖掘 - 为什么我的成本函数返回预期值的两倍？ - 吾爱随笔录

我试图实现一个propagate()计算成本函数及其梯度的函数，我知道：

前向传播：

你得到 X
你计算 $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
您计算成本函数： $J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

这是我使用的两个公式：

\begin{matrix} (7) & \frac{\partial J}{\partial w} = \frac{1}{m} X (A - Y)^{T} \end{matrix}

$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$

\begin{matrix} (8) & \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i = 1}^{m} (a^{(i)} - y^{(i)}) \end{matrix}

$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$

我编码为：

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b

    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """

    m = X.shape[1]

    # FORWARD PROPAGATION (FROM X TO COST)
    ### START CODE HERE ### (≈ 2 lines of code)
    A = sigmoid(np.dot(w.T,X)+b)                                    # compute activation
    cost = -1/m * np.sum(np.dot(Y.T,np.log(A)) + np.dot((1-Y).T,np.log(1-A)))                                # compute cost
    ### END CODE HERE ###

    # BACKWARD PROPAGATION (TO FIND GRAD)
    ### START CODE HERE ### (≈ 2 lines of code)
    dw = 1/m*np.dot(X,(A-Y).T)
    db = 1/m*np.sum(A-Y)
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())

    grads = {"dw": dw,
             "db": db}

    return grads, cost

使用以下示例：

w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1.,2.,-1.],[3.,4.,-3.2]]), np.array([[1,0,1]])
grads, cost = propagate(w, b, X, Y)

我的输出是：

dw = [[ 0.99845601]
 [ 2.39507239]]
db = 0.00145557813678
cost = 10.6046359582

而预期的输出是