pytorch: grad can be implicitly created only for scalar outputs

2023-08-01 06:37:43

這個錯誤很早就遇到過但是沒看到網上叙述清楚的，這裡順便寫一下。

這裡貼一下autograd.grad()的注釋

grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)
    Computes and returns the sum of gradients of outputs w.r.t. the inputs.
    ``grad_outputs`` should be a sequence of length matching ``output``
    containing the pre-computed gradients w.r.t. each of the outputs. If an
    output doesn't require_grad, then the gradient can be ``None``).
    If ``only_inputs`` is ``True``, the function will only return a list of gradients
    w.r.t the specified inputs. If it's ``False``, then gradient w.r.t. all remaining
    leaves will still be computed, and will be accumulated into their ``.grad``
    attribute.
    
    Arguments:
        outputs (sequence of Tensor): outputs of the differentiated function.
        inputs (sequence of Tensor): Inputs w.r.t. which the gradient will be
            returned (and not accumulated into ``.grad``).
        grad_outputs (sequence of Tensor): Gradients w.r.t. each output.
            None values can be specified for scalar Tensors or ones that don't require
            grad. If a None value would be acceptable for all grad_tensors, then this
            argument is optional. Default: None.
        retain_graph (bool, optional): If ``False``, the graph used to compute the grad
            will be freed. Note that in nearly all cases setting this option to ``True``
            is not needed and often can be worked around in a much more efficient
            way. Defaults to the value of ``create_graph``.
        create_graph (bool, optional): If ``True``, graph of the derivative will
            be constructed, allowing to compute higher order derivative products.
            Default: ``False``.
        allow_unused (bool, optional): If ``False``, specifying inputs that were not
            used when computing outputs (and therefore their grad is always zero)
            is an error. Defaults to ``False``.

如下代碼

>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a    
>>> autograd.grad(outputs=b,inputs=a)  # 這裡b為向量
RuntimeError: grad can be implicitly created only for scalar outputs

因為計算梯度時outputs需為标量(未指明grad_outputs或grad_outputs為None時)，是以上面的代碼會報錯，而如下代碼可以正常運作：

>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> z=b.sum()    
>>> autograd.grad(outputs=z,inputs=a) # 這裡z為标量
(tensor([ 3.,  3.,  3.]),)

也可以通過指定grad_outputs，這時計算梯度就不再需要outputs為标量了，如下

>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> autograd.grad(outputs=b,inputs=a,grad_outputs=torch.ones_like(a))
(tensor([ 3.,  3.,  3.]),)

grad_outputs在GPU下時可寫作以下形式

grad_outputs = Variable(torch.Tensor(torch.ones_like(a)),requires_grad=False)

pytorch: grad can be implicitly created only for scalar outputs

繼續閱讀

pytorch：List中包含Tensor的grad資料怎麼辦？

相隔為1的編輯距離

Algorithms Review: Divide and Conquer(Binary Search & Merge Sort)

watermark performance standard &amp; algorithms

Visual Tracking 和 Motion Estimation的差別

[zz]The Most Important Algorithms (in CS and Math)

High-level Synthesis from AutoESL: A Game-changer for Chip Design

Pytorch機器學習（九）—— YOLO中對于錨框，預測框，産生候選區域及對候選區域進行标注詳解 Pytorch機器學習（九）—— YOLO中錨框，預測框，産生候選區域及對候選區域進行标注詳解前言一、基本概念二、代碼講解總結

采用ODC改善軟體品質：一個案例研究

CogView: Mastering Text-to-Image Generation via Transformers翻譯摘要1.介紹2.方法3.Finetuning

【深度學習】損失函數記錄0. 前言1. 正文參考文獻

深度學習之卷積01 卷積02 填充Padding03 步幅Stride04 卷積核的選擇05 多通道卷積參考

各種二分查找

查找算法學習之二分查找（Python版本）——BinarySearch

一道某高大上網際網路公司的筆試題分享

【Torch】最簡潔logging使用指南