pytroch如何对梯度追踪张量进行inplace操作

如何对梯度追踪张量进行inplace操作

别忘了梯度追踪张量必须为float

Only Tensors of floating point and complex dtype can require gradients

问题来源

当对设置了requires_grad=True的张量进行原地操作时，pytorch会抛出运行错误：

1
2
3

>> X = torch.rand((3,4),requires_grad = True)
>> X.fill_(0)
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

什么情况下会进行原地操作（当前遇到的）:

模型参数初始化
自定义实现梯度下降法
避免梯度积累，每次训练进行梯度清零

如何解决

使用 .data 或 .detach() 方法，获得原张量的同内存但不进行梯度追踪的张量

>> print(X)
>> print(X.data.requires_grad)
>> print(X.detach().requires_grad)
tensor([[0.2407, 0.3222, 0.4246, 0.3125],
        [0.1386, 0.7018, 0.1751, 0.0617],
        [0.3467, 0.5178, 0.2557, 0.9855]], requires_grad=True)
False
False

使用该不追踪梯度张量替代执行inplace操作

# 自定义实现梯度下降
net.weight.detach().sub_(net.weight.grad,alpha = lr)
net.bias.detach().sub_(net.bias.grad,alpha = lr)

.data 与 .detach()的区别在于
- .data 的inplace修改自动求导不监控，如果修改了梯度追踪节点的值，可能导致求导结果错误
- .detach() 的inplace修改自动求导同样监控，如果修改了梯度追踪节点的值再进行求导，系统会报错

 1
2
3
4
5
6
7
8
9
10
11
# 正常使用
>> X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
>> print(X)
>> Y = X ** 2
>> Y.sum().backward()
# 梯队为2X
>> print(X.grad)
tensor([[5., 1., 9.],
        [6., 5., 7.]], requires_grad=True)
tensor([[10.,  2., 18.],
        [12., 10., 14.]])


 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 使用data
>> X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
>> print(X)
>>  = X ** 2
>> X.data *= 2
>> Y.sum().backward()
>> print(Y)
# 梯队为(梯度相较于正常的放大了两倍)
>> print(X.grad)
tensor([[5., 1., 9.],
        [6., 5., 7.]], requires_grad=True)
tensor([[25.,  1., 81.],
        [36., 25., 49.]], grad_fn=<PowBackward0>)
tensor([[20.,  4., 36.],
        [24., 20., 28.]])


 1
2
3
4
5
6
7
8
9
10
11
# 使用detach()
X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
print(X)
Y = X ** 2
X.detach().add_(100)
Y.sum().backward()
print(Y)
# 梯队为(梯度相较于正常的放大了两倍)
print(X.grad)
直接报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

detach() 与 detach_()的区别
- detach_()不仅获得未追踪梯度同内存tensor，还将当前节点设置为叶子节点（即求梯度是求到当前节点停止，不在向前继续计算，截断方向传播计算图）

使用pytorch的init模块初始化模块参数

torch.nn.init+初始化函数名

# 正态分布初始化
net = nn.Linear(feature_num, 1)
for param in net.parameters():
    nn.init.normal_(param, mean=0, std=0.01)

自动求导相关属性

requires_grad 是否进行梯度追踪

Is True if gradients need to be computed for this Tensor, False otherwise.
grad 存储梯度的tensor数组，未进行反向传播计算时为None，多次计算梯度会进行累加

This attribute is None by default and becomes a Tensor the first time a call to backward() computes gradients for self. The attribute will then contain the gradients computed and future calls to backward() will accumulate (add) gradients into it.
is_leaf 所有用户创建的梯度追踪结点为叶子节点,只有叶子节点的梯度值会被计算,可通过retrain_grad()获得非叶子节点的梯度

All Tensors that have requires_grad which is False will be leaf Tensors by convention.

For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.

Only leaf Tensors will have their grad populated during a call to backward(). To get grad populated for non-leaf Tensors, you can use retain_grad().