這個錯誤很早就遇到過但是沒看到網上叙述清楚的,這裡順便寫一下。
這裡貼一下autograd.grad()的注釋
grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)
Computes and returns the sum of gradients of outputs w.r.t. the inputs.
``grad_outputs`` should be a sequence of length matching ``output``
containing the pre-computed gradients w.r.t. each of the outputs. If an
output doesn't require_grad, then the gradient can be ``None``).
If ``only_inputs`` is ``True``, the function will only return a list of gradients
w.r.t the specified inputs. If it's ``False``, then gradient w.r.t. all remaining
leaves will still be computed, and will be accumulated into their ``.grad``
attribute.
Arguments:
outputs (sequence of Tensor): outputs of the differentiated function.
inputs (sequence of Tensor): Inputs w.r.t. which the gradient will be
returned (and not accumulated into ``.grad``).
grad_outputs (sequence of Tensor): Gradients w.r.t. each output.
None values can be specified for scalar Tensors or ones that don't require
grad. If a None value would be acceptable for all grad_tensors, then this
argument is optional. Default: None.
retain_graph (bool, optional): If ``False``, the graph used to compute the grad
will be freed. Note that in nearly all cases setting this option to ``True``
is not needed and often can be worked around in a much more efficient
way. Defaults to the value of ``create_graph``.
create_graph (bool, optional): If ``True``, graph of the derivative will
be constructed, allowing to compute higher order derivative products.
Default: ``False``.
allow_unused (bool, optional): If ``False``, specifying inputs that were not
used when computing outputs (and therefore their grad is always zero)
is an error. Defaults to ``False``.
如下代碼
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> autograd.grad(outputs=b,inputs=a) # 這裡b為向量
RuntimeError: grad can be implicitly created only for scalar outputs
因為計算梯度時outputs需為标量(未指明grad_outputs或grad_outputs為None時),是以上面的代碼會報錯,而如下代碼可以正常運作:
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> z=b.sum()
>>> autograd.grad(outputs=z,inputs=a) # 這裡z為标量
(tensor([ 3., 3., 3.]),)
也可以通過指定grad_outputs,這時計算梯度就不再需要outputs為标量了,如下
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> autograd.grad(outputs=b,inputs=a,grad_outputs=torch.ones_like(a))
(tensor([ 3., 3., 3.]),)
grad_outputs在GPU下時可寫作以下形式
grad_outputs = Variable(torch.Tensor(torch.ones_like(a)),requires_grad=False)