0%

pytroch如何对梯度追踪张量进行inplace操作

如何对梯度追踪张量进行inplace操作

别忘了梯度追踪张量必须为float

Only Tensors of floating point and complex dtype can require gradients

问题来源

当对设置了requires_grad=True的张量进行原地操作时,pytorch会抛出运行错误:

1
2
3
>> X = torch.rand((3,4),requires_grad = True)
>> X.fill_(0)
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

什么情况下会进行原地操作(当前遇到的):

  • 模型参数初始化
  • 自定义实现梯度下降法
  • 避免梯度积累,每次训练进行梯度清零

如何解决

  1. 使用 .data 或 .detach() 方法,获得原张量的同内存但不进行梯度追踪的张量

    1
    2
    3
    4
    5
    6
    7
    8
    >> print(X)
    >> print(X.data.requires_grad)
    >> print(X.detach().requires_grad)
    tensor([[0.2407, 0.3222, 0.4246, 0.3125],
    [0.1386, 0.7018, 0.1751, 0.0617],
    [0.3467, 0.5178, 0.2557, 0.9855]], requires_grad=True)
    False
    False
    • 使用该不追踪梯度张量替代执行inplace操作

      # 自定义实现梯度下降
      net.weight.detach().sub_(net.weight.grad,alpha = lr)
      net.bias.detach().sub_(net.bias.grad,alpha = lr)
      
    • .data 与 .detach()的区别在于

      • .data 的inplace修改自动求导不监控,如果修改了梯度追踪节点的值,可能导致求导结果错误

      • .detach() 的inplace修改自动求导同样监控,如果修改了梯度追踪节点的值再进行求导,系统会报错

 
1
2
3
4
5
6
7
8
9
10
11
# 正常使用
>> X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
>> print(X)
>> Y = X ** 2
>> Y.sum().backward()
# 梯队为2X
>> print(X.grad)
tensor([[5., 1., 9.],
[6., 5., 7.]], requires_grad=True)
tensor([[10., 2., 18.],
[12., 10., 14.]])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 使用data
>> X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
>> print(X)
>> = X ** 2
>> X.data *= 2
>> Y.sum().backward()
>> print(Y)
# 梯队为(梯度相较于正常的放大了两倍)
>> print(X.grad)
tensor([[5., 1., 9.],
[6., 5., 7.]], requires_grad=True)
tensor([[25., 1., 81.],
[36., 25., 49.]], grad_fn=<PowBackward0>)
tensor([[20., 4., 36.],
[24., 20., 28.]])
1
2
3
4
5
6
7
8
9
10
11
# 使用detach()
X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
print(X)
Y = X ** 2
X.detach().add_(100)
Y.sum().backward()
print(Y)
# 梯队为(梯度相较于正常的放大了两倍)
print(X.grad)
直接报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
  • detach() 与 detach_()的区别
    • detach_()不仅获得未追踪梯度同内存tensor,还将当前节点设置为叶子节点(即求梯度是求到当前节点停止,不在向前继续计算,截断方向传播计算图)
  1. 使用pytorch的init模块初始化模块参数

    torch.nn.init+初始化函数名

    1
    2
    3
    4
    # 正态分布初始化
    net = nn.Linear(feature_num, 1)
    for param in net.parameters():
    nn.init.normal_(param, mean=0, std=0.01)

自动求导相关属性

  • requires_grad 是否进行梯度追踪

    Is True if gradients need to be computed for this Tensor, False otherwise.

  • grad 存储梯度的tensor数组,未进行反向传播计算时为None,多次计算梯度会进行累加

    This attribute is None by default and becomes a Tensor the first time a call to backward() computes gradients for self. The attribute will then contain the gradients computed and future calls to backward() will accumulate (add) gradients into it.

  • is_leaf 所有用户创建的梯度追踪结点为叶子节点,只有叶子节点的梯度值会被计算,可通过retrain_grad()获得非叶子节点的梯度

    All Tensors that have requires_grad which is False will be leaf Tensors by convention.

    For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.

    Only leaf Tensors will have their grad populated during a call to backward(). To get grad populated for non-leaf Tensors, you can use retain_grad().