如何对梯度追踪张量进行inplace操作
别忘了梯度追踪张量必须为float
Only Tensors of floating point and complex dtype can require gradients
问题来源
当对设置了requires_grad=True的张量进行原地操作时,pytorch会抛出运行错误:
1 | >> X = torch.rand((3,4),requires_grad = True) |
什么情况下会进行原地操作(当前遇到的):
- 模型参数初始化
- 自定义实现梯度下降法
- 避免梯度积累,每次训练进行梯度清零
如何解决
使用 .data 或 .detach() 方法,获得原张量的同内存但不进行梯度追踪的张量
1
2
3
4
5
6
7
8>> print(X)
>> print(X.data.requires_grad)
>> print(X.detach().requires_grad)
tensor([[0.2407, 0.3222, 0.4246, 0.3125],
[0.1386, 0.7018, 0.1751, 0.0617],
[0.3467, 0.5178, 0.2557, 0.9855]], requires_grad=True)
False
False使用该不追踪梯度张量替代执行inplace操作
# 自定义实现梯度下降 net.weight.detach().sub_(net.weight.grad,alpha = lr) net.bias.detach().sub_(net.bias.grad,alpha = lr)
.data 与 .detach()的区别在于
.data 的inplace修改自动求导不监控,如果修改了梯度追踪节点的值,可能导致求导结果错误
.detach() 的inplace修改自动求导同样监控,如果修改了梯度追踪节点的值再进行求导,系统会报错
1
2
3
4
5
6
7
8
9
10
11
# 正常使用
>> X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
>> print(X)
>> Y = X ** 2
>> Y.sum().backward()
# 梯队为2X
>> print(X.grad)
tensor([[5., 1., 9.],
[6., 5., 7.]], requires_grad=True)
tensor([[10., 2., 18.],
[12., 10., 14.]])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 使用data
>> X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
>> print(X)
>> = X ** 2
>> X.data *= 2
>> Y.sum().backward()
>> print(Y)
# 梯队为(梯度相较于正常的放大了两倍)
>> print(X.grad)
tensor([[5., 1., 9.],
[6., 5., 7.]], requires_grad=True)
tensor([[25., 1., 81.],
[36., 25., 49.]], grad_fn=<PowBackward0>)
tensor([[20., 4., 36.],
[24., 20., 28.]])
1
2
3
4
5
6
7
8
9
10
11
# 使用detach()
X = torch.tensor([[5, 1, 9],[6, 5, 7]],requires_grad = True,dtype = torch.float)
print(X)
Y = X ** 2
X.detach().add_(100)
Y.sum().backward()
print(Y)
# 梯队为(梯度相较于正常的放大了两倍)
print(X.grad)
直接报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
- detach() 与 detach_()的区别
- detach_()不仅获得未追踪梯度同内存tensor,还将当前节点设置为叶子节点(即求梯度是求到当前节点停止,不在向前继续计算,截断方向传播计算图)
使用pytorch的init模块初始化模块参数
torch.nn.init+初始化函数名
1
2
3
4# 正态分布初始化
net = nn.Linear(feature_num, 1)
for param in net.parameters():
nn.init.normal_(param, mean=0, std=0.01)
自动求导相关属性
requires_grad 是否进行梯度追踪
Is
True
if gradients need to be computed for this Tensor,False
otherwise.grad 存储梯度的tensor数组,未进行反向传播计算时为None,多次计算梯度会进行累加
This attribute is
None
by default and becomes a Tensor the first time a call tobackward()
computes gradients forself
. The attribute will then contain the gradients computed and future calls tobackward()
will accumulate (add) gradients into it.is_leaf 所有用户创建的梯度追踪结点为叶子节点,只有叶子节点的梯度值会被计算,可通过retrain_grad()获得非叶子节点的梯度
All Tensors that have
requires_grad
which isFalse
will be leaf Tensors by convention.For Tensors that have
requires_grad
which isTrue
, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and sograd_fn
is None.Only leaf Tensors will have their
grad
populated during a call tobackward()
. To getgrad
populated for non-leaf Tensors, you can useretain_grad()
.