Pytorch Tutorial-Autograd

这是我学习Pytorch时记录的一些笔记 ,希望能对你有所帮助😊

Autograd

autograd 的用武之地:它跟踪每次计算的历史记录。PyTorch 模型中的每个计算张量都带有其输入张量和用于创建张量的函数的历史记录。结合用于作用于张量的 PyTorch 函数,每个函数都有一个用于计算自己的导数的内置实现,这大大加快了学习所需的局部导数的计算速度。

举例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
%matplotlib inline
import torch

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math

a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)
print(a)
tensor([0.0000, 0.2618, 0.5236, 0.7854, 1.0472, 1.3090, 1.5708, 1.8326, 2.0944,
2.3562, 2.6180, 2.8798, 3.1416, 3.4034, 3.6652, 3.9270, 4.1888, 4.4506,
4.7124, 4.9742, 5.2360, 5.4978, 5.7596, 6.0214, 6.2832],
requires_grad=True)

b = torch.sin(a)
c = 2 * b
d = c + 1
out = d.sum()
out.backward()
print(a.grad)
plt.plot(a.detach(), a.grad.detach())

tensor([ 2.0000e+00, 1.9319e+00, 1.7321e+00, 1.4142e+00, 1.0000e+00,
5.1764e-01, -8.7423e-08, -5.1764e-01, -1.0000e+00, -1.4142e+00,
-1.7321e+00, -1.9319e+00, -2.0000e+00, -1.9319e+00, -1.7321e+00,
-1.4142e+00, -1.0000e+00, -5.1764e-01, 2.3850e-08, 5.1764e-01,
1.0000e+00, 1.4142e+00, 1.7321e+00, 1.9319e+00, 2.0000e+00])

Be aware than only leaf nodes of the computation have their gradients computed. If you tried, for example, print(c.grad) you’d get back None. In this simple example, only the input is a leaf node, so only it has gradients computed.

Jacobian

If you have a function with an n-dimensional input and m-dimensional output, , the complete gradient is a matrix of the derivative of every output with respect to every input, called the Jacobian:
$$
\begin{align}J=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\
\vdots & \ddots & \vdots\
\frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\end{align}
$$
If you have a second function, $l=g\left(\vec{y}\right)$ that takes m-dimensional input (that is, the same dimensionality as the output above), and returns a scalar output, you can express its gradients with respect to $\vec{y}$ as a column vector, $v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$ - which is really just a one-column Jacobian.

More concretely, imagine the first function as your PyTorch model (with potentially many inputs and many outputs) and the second function as a loss function (with the model’s output as input, and the loss value as the scalar output).

If we multiply the first function’s Jacobian by the gradient of the second function, and apply the chain rule, we get:

$$
\begin{align}J^{T}\cdot v=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\
\vdots & \ddots & \vdots\
\frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\left(\begin{array}{c}
\frac{\partial l}{\partial y_{1}}\
\vdots\
\frac{\partial l}{\partial y_{m}}
\end{array}\right)=\left(\begin{array}{c}
\frac{\partial l}{\partial x_{1}}\
\vdots\
\frac{\partial l}{\partial x_{n}}
\end{array}\right)\end{align}
$$

Note: You could also use the equivalent operation $v^{T}\cdot J$, and get back a row vector.

The resulting column vector is the gradient of the second function with respect to the inputs of the first - or in the case of our model and loss function, the gradient of the loss with respect to the model inputs.

torch.autograd is an engine for computing these products. This is how we accumulate the gradients over the learning weights during the backward pass.

For this reason, the backward() call can also take an optional vector input. This vector represents a set of gradients over the tensor, which are multiplied by the Jacobian of the autograd-traced tensor that precedes it. Let’s try a specific example with a small vector:

嗯,这也就是后面的示例2所演示的,戳我直达

High-Level API

There is an API on autograd that gives you direct access to important differential matrix and vector operations. In particular, it allows you to calculate the Jacobian and the Hessian matrices of a particular function for particular inputs. (The Hessian is like the Jacobian, but expresses all partial second derivatives.) It also provides methods for taking vector products with these matrices.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def exp_adder(x, y):
return 2 * x.exp() + 3 * y

inputs = (torch.rand(1), torch.rand(1)) # arguments for the function
print(inputs)
torch.autograd.functional.jacobian(exp_adder, inputs)
(tensor([0.7327]), tensor([0.8072]))
(tensor([[4.1616]]), tensor([[3.]]))


inputs = (torch.rand(3), torch.rand(3)) # arguments for the function
print(inputs)
torch.autograd.functional.jacobian(exp_adder, inputs)
(tensor([0.3723, 0.7282, 0.0756]), tensor([0.9602, 0.6986, 0.2260]))

(tensor([[2.9022, 0.0000, 0.0000],
[0.0000, 4.1428, 0.0000],
[0.0000, 0.0000, 2.1570]]),
tensor([[3., 0., 0.],
[0., 3., 0.],
[0., 0., 3.]]))
  • 还有能够直接计算向量-雅各比乘积的function:The torch.autograd.functional.jvp() method performs the same matrix multiplication as vjp() with the operands reversed

  • 全文重点感觉不够突出,我让AI给了一份笔记

AI总结

PyTorch Autograd 功能总结

PyTorch 的 autograd(自动微分)是其核心功能之一,用于 自动计算张量的梯度,支持动态计算图(Dynamic Computation Graph),是训练神经网络的基础。以下是关键点总结和示例:


1. Autograd 的核心机制

功能说明
动态计算图在运行时构建计算图,灵活支持控制流(如循环、条件语句)。
自动梯度计算通过 backward() 自动计算梯度,存储在 .grad 属性中。
叶子张量直接创建的张量(如 torch.tensor())是叶子节点,可累积梯度。
非叶子张量由运算产生的张量,默认不保留梯度(除非显式设置 retain_grad())。

2. 关键函数与方法

方法/类作用示例
requires_grad=True启用梯度追踪x = torch.tensor(1.0, requires_grad=True)
backward()反向传播计算梯度y.backward()
grad 属性存储梯度值x.grad
torch.no_grad()临时禁用梯度计算with torch.no_grad():
detach()分离张量,阻止梯度传播y_detached = y.detach()
torch.autograd.grad()直接计算梯度(不修改 .gradgrad = torch.autograd.grad(y, x)

3. 示例详解

示例 1:基本梯度计算

1
2
3
4
5
6
7
8
9
10
import torch

# 定义叶子张量(启用梯度)
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x # y = x² + 3x

# 计算梯度
y.backward() # dy/dx = 2x + 3

print(x.grad) # 输出: tensor(7.0) (x=2时,dy/dx=2*2+3=7)

示例 2:非标量输出的梯度(需指定 gradient 参数)

1
2
3
4
5
6
7
8
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = x * 2 # y = [2.0, 4.0]

# 定义梯度权重(模拟损失对各分量的敏感度)
gradient = torch.tensor([0.1, 0.01])
y.backward(gradient) # 等价于加权求和后的反向传播

print(x.grad) # 输出: tensor([0.2000, 0.0200]) (2*0.1=0.2, 2*0.01=0.02)

示例 3:冻结部分参数(detachrequires_grad=False

1
2
3
4
5
6
7
8
9
# 场景:冻结模型的一部分参数
model = torch.nn.Linear(3, 2)
for param in model.parameters():
param.requires_grad = False # 冻结所有参数

# 或者分离中间张量
x = torch.tensor(1.0, requires_grad=True)
y = x * 2
z = y.detach() # z 不参与梯度计算

示例 4:高阶导数(create_graph=True

1
2
3
4
5
6
7
8
x = torch.tensor(3.0, requires_grad=True)
y = x ** 3 # y = x³

# 一阶导数
grad1 = torch.autograd.grad(y, x, create_graph=True) # dy/dx = 3x² → 27
# 二阶导数
grad2 = torch.autograd.grad(grad1, x) # d²y/dx² = 6x → 18
print(grad2[0]) # 输出: tensor(18.0)

示例 5:梯度清零与累加

1
2
3
4
5
6
7
8
9
10
11
12
13
x = torch.tensor(1.0, requires_grad=True)
y = x ** 2

# 第一次反向传播
y.backward()
print(x.grad) # 输出: tensor(2.0) (dy/dx=2x)

# 梯度会累加!
y.backward()
print(x.grad) # 输出: tensor(4.0) (2.0 + 2.0)

# 手动清零梯度
x.grad.zero_()

4. Autograd 的注意事项

  1. 内存管理

    • 默认情况下,backward() 会释放计算图。若需多次反向传播,需设置 retain_graph=True
    • 示例:loss.backward(retain_graph=True)
  2. 性能优化

    • 推理时用 torch.no_grad() 禁用梯度以节省内存:
      1
      2
      with torch.no_grad():
      output = model(input)
  3. 梯度截断

    • 防止梯度爆炸:
      1
      torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
  4. 非叶子张量的梯度

    • 默认不保留,如需保留需调用 retain_grad()
      1
      2
      y = x * 2
      y.retain_grad() # 保留 y 的梯度

5. 动态计算图的优势

  • 支持控制流:计算图在运行时动态构建,可包含 iffor 等语句。

    1
    2
    3
    4
    5
    6
    x = torch.tensor(1.0, requires_grad=True)
    if x > 0:
    y = x ** 2
    else:
    y = x ** 3
    y.backward() # 自动处理分支
  • 灵活性:适用于变长输入(如RNN)或复杂架构(如动态网络)。


总结

场景操作代码示例
基本梯度计算backward()y.backward()
非标量输出backward(gradient=...)y.backward(torch.tensor([0.1, 0.01]))
冻结参数detach()requires_grad=Falseparam.detach()
高阶导数create_graph=Truetorch.autograd.grad(..., create_graph=True)
避免梯度累加zero_grad()optimizer.zero_grad()

PyTorch 的 autograd 通过动态计算图和自动微分,极大简化了梯度计算过程,是深度学习模型训练的核心工具。


Pytorch Tutorial-Autograd
http://pzhwuhu.github.io/2025/08/12/Autograd/
本文作者
pzhwuhu
发布于
2025年8月12日
更新于
2025年8月16日
许可协议