Pytorch Tutorial-Autograd

这是我学习Pytorch时记录的一些笔记，希望能对你有所帮助😊

Autograd

autograd 的用武之地：它跟踪每次计算的历史记录。PyTorch 模型中的每个计算张量都带有其输入张量和用于创建张量的函数的历史记录。结合用于作用于张量的 PyTorch 函数，每个函数都有一个用于计算自己的导数的内置实现，这大大加快了学习所需的局部导数的计算速度。

举例

%matplotlib inline
import torch

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import math

a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)
print(a)
tensor([0.0000, 0.2618, 0.5236, 0.7854, 1.0472, 1.3090, 1.5708, 1.8326, 2.0944,
        2.3562, 2.6180, 2.8798, 3.1416, 3.4034, 3.6652, 3.9270, 4.1888, 4.4506,
        4.7124, 4.9742, 5.2360, 5.4978, 5.7596, 6.0214, 6.2832],
       requires_grad=True)
       
b = torch.sin(a)
c = 2 * b
d = c + 1
out = d.sum()
out.backward()
print(a.grad)
plt.plot(a.detach(), a.grad.detach())

tensor([ 2.0000e+00,  1.9319e+00,  1.7321e+00,  1.4142e+00,  1.0000e+00,
         5.1764e-01, -8.7423e-08, -5.1764e-01, -1.0000e+00, -1.4142e+00,
        -1.7321e+00, -1.9319e+00, -2.0000e+00, -1.9319e+00, -1.7321e+00,
        -1.4142e+00, -1.0000e+00, -5.1764e-01,  2.3850e-08,  5.1764e-01,
         1.0000e+00,  1.4142e+00,  1.7321e+00,  1.9319e+00,  2.0000e+00])

Be aware than only leaf nodes of the computation have their gradients computed. If you tried, for example, print(c.grad) you’d get back None. In this simple example, only the input is a leaf node, so only it has gradients computed.

Jacobian

If you have a function with an n-dimensional input and m-dimensional output, , the complete gradient is a matrix of the derivative of every output with respect to every input, called the Jacobian:
$$
\begin{align}J=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\
\vdots & \ddots & \vdots\
\frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\end{align}
$$
If you have a second function, $l=g\left(\vec{y}\right)$ that takes m-dimensional input (that is, the same dimensionality as the output above), and returns a scalar output, you can express its gradients with respect to $\vec{y}$ as a column vector, $v=\left(\begin{array}{ccc}\frac{\partial l}{\partial y_{1}} & \cdots & \frac{\partial l}{\partial y_{m}}\end{array}\right)^{T}$ - which is really just a one-column Jacobian.

More concretely, imagine the first function as your PyTorch model (with potentially many inputs and many outputs) and the second function as a loss function (with the model’s output as input, and the loss value as the scalar output).

If we multiply the first function’s Jacobian by the gradient of the second function, and apply the chain rule, we get:

$$
\begin{align}J^{T}\cdot v=\left(\begin{array}{ccc}
\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\
\vdots & \ddots & \vdots\
\frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right)\left(\begin{array}{c}
\frac{\partial l}{\partial y_{1}}\
\vdots\
\frac{\partial l}{\partial y_{m}}
\end{array}\right)=\left(\begin{array}{c}
\frac{\partial l}{\partial x_{1}}\
\vdots\
\frac{\partial l}{\partial x_{n}}
\end{array}\right)\end{align}
$$

Note: You could also use the equivalent operation $v^{T}\cdot J$, and get back a row vector.

The resulting column vector is the gradient of the second function with respect to the inputs of the first - or in the case of our model and loss function, the gradient of the loss with respect to the model inputs.

torch.autograd is an engine for computing these products. This is how we accumulate the gradients over the learning weights during the backward pass.

For this reason, the backward() call can also take an optional vector input. This vector represents a set of gradients over the tensor, which are multiplied by the Jacobian of the autograd-traced tensor that precedes it. Let’s try a specific example with a small vector:

嗯，这也就是后面的示例2所演示的，戳我直达

High-Level API

There is an API on autograd that gives you direct access to important differential matrix and vector operations. In particular, it allows you to calculate the Jacobian and the Hessian matrices of a particular function for particular inputs. (The Hessian is like the Jacobian, but expresses all partial second derivatives.) It also provides methods for taking vector products with these matrices.

def exp_adder(x, y):
    return 2 * x.exp() + 3 * y

inputs = (torch.rand(1), torch.rand(1)) # arguments for the function
print(inputs)
torch.autograd.functional.jacobian(exp_adder, inputs)
(tensor([0.7327]), tensor([0.8072]))
(tensor([[4.1616]]), tensor([[3.]]))


inputs = (torch.rand(3), torch.rand(3)) # arguments for the function
print(inputs)
torch.autograd.functional.jacobian(exp_adder, inputs)
(tensor([0.3723, 0.7282, 0.0756]), tensor([0.9602, 0.6986, 0.2260]))

(tensor([[2.9022, 0.0000, 0.0000],
         [0.0000, 4.1428, 0.0000],
         [0.0000, 0.0000, 2.1570]]),
 tensor([[3., 0., 0.],
         [0., 3., 0.],
         [0., 0., 3.]]))

还有能够直接计算向量-雅各比乘积的function：The torch.autograd.functional.jvp() method performs the same matrix multiplication as vjp() with the operands reversed

全文重点感觉不够突出，我让AI给了一份笔记

AI总结

PyTorch Autograd 功能总结

PyTorch 的 autograd（自动微分）是其核心功能之一，用于 自动计算张量的梯度，支持动态计算图（Dynamic Computation Graph），是训练神经网络的基础。以下是关键点总结和示例：

1. Autograd 的核心机制

功能	说明
动态计算图	在运行时构建计算图，灵活支持控制流（如循环、条件语句）。
自动梯度计算	通过 `backward()` 自动计算梯度，存储在 `.grad` 属性中。
叶子张量	直接创建的张量（如 `torch.tensor()`）是叶子节点，可累积梯度。
非叶子张量	由运算产生的张量，默认不保留梯度（除非显式设置 `retain_grad()`）。

2. 关键函数与方法

方法/类	作用	示例
`requires_grad=True`	启用梯度追踪	`x = torch.tensor(1.0, requires_grad=True)`
`backward()`	反向传播计算梯度	`y.backward()`
`grad` 属性	存储梯度值	`x.grad`
`torch.no_grad()`	临时禁用梯度计算	`with torch.no_grad():`
`detach()`	分离张量，阻止梯度传播	`y_detached = y.detach()`
`torch.autograd.grad()`	直接计算梯度（不修改 `.grad`）	`grad = torch.autograd.grad(y, x)`

3. 示例详解

示例 1：基本梯度计算

import torch

# 定义叶子张量（启用梯度）
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x  # y = x² + 3x

# 计算梯度
y.backward()  # dy/dx = 2x + 3

print(x.grad)  # 输出: tensor(7.0) （x=2时，dy/dx=2*2+3=7）

示例 2：非标量输出的梯度（需指定 `gradient` 参数）

x = torch.tensor([1.0, 2.0], requires_grad=True)
y = x * 2  # y = [2.0, 4.0]

# 定义梯度权重（模拟损失对各分量的敏感度）
gradient = torch.tensor([0.1, 0.01])
y.backward(gradient)  # 等价于加权求和后的反向传播

print(x.grad)  # 输出: tensor([0.2000, 0.0200]) （2*0.1=0.2, 2*0.01=0.02）

示例 3：冻结部分参数（`detach` 或 `requires_grad=False`）

# 场景：冻结模型的一部分参数
model = torch.nn.Linear(3, 2)
for param in model.parameters():
    param.requires_grad = False  # 冻结所有参数

# 或者分离中间张量
x = torch.tensor(1.0, requires_grad=True)
y = x * 2
z = y.detach()  # z 不参与梯度计算

示例 4：高阶导数（`create_graph=True`）

x = torch.tensor(3.0, requires_grad=True)
y = x ** 3  # y = x³

# 一阶导数
grad1 = torch.autograd.grad(y, x, create_graph=True)  # dy/dx = 3x² → 27
# 二阶导数
grad2 = torch.autograd.grad(grad1, x)               # d²y/dx² = 6x → 18
print(grad2[0])  # 输出: tensor(18.0)

示例 5：梯度清零与累加

x = torch.tensor(1.0, requires_grad=True)
y = x ** 2

# 第一次反向传播
y.backward()
print(x.grad)  # 输出: tensor(2.0) （dy/dx=2x）

# 梯度会累加！
y.backward()
print(x.grad)  # 输出: tensor(4.0) （2.0 + 2.0）

# 手动清零梯度
x.grad.zero_()

4. Autograd 的注意事项

内存管理：
- 默认情况下，backward() 会释放计算图。若需多次反向传播，需设置 retain_graph=True。
- 示例：loss.backward(retain_graph=True)。
性能优化：
- 推理时用 torch.no_grad() 禁用梯度以节省内存：
  1
  2
  with torch.no_grad(): output = model(input)

梯度截断：

防止梯度爆炸：

1	`torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)`

非叶子张量的梯度：
- 默认不保留，如需保留需调用 retain_grad()：
  1
  2
  y = x * 2 y.retain_grad() # 保留 y 的梯度

5. 动态计算图的优势

支持控制流：计算图在运行时动态构建，可包含 if、for 等语句。

x = torch.tensor(1.0, requires_grad=True)
if x > 0:
    y = x ** 2
else:
    y = x ** 3
y.backward()  # 自动处理分支

灵活性：适用于变长输入（如RNN）或复杂架构（如动态网络）。

总结

场景	操作	代码示例
基本梯度计算	`backward()`	`y.backward()`
非标量输出	`backward(gradient=...)`	`y.backward(torch.tensor([0.1, 0.01]))`
冻结参数	`detach()` 或 `requires_grad=False`	`param.detach()`
高阶导数	`create_graph=True`	`torch.autograd.grad(..., create_graph=True)`
避免梯度累加	`zero_grad()`	`optimizer.zero_grad()`

PyTorch 的 autograd 通过动态计算图和自动微分，极大简化了梯度计算过程，是深度学习模型训练的核心工具。

Pytorch

#Pytorch #Python #ML #Autograd #Tensor

Pytorch Tutorial-Autograd

http://pzhwuhu.github.io/2025/08/12/Autograd/

本文作者

pzhwuhu

发布于

2025年8月12日

更新于

2025年8月16日

许可协议

Pytorch Tutorial-Building models 上一篇

Pytorch Tutorial-Tensors 下一篇

Pytorch Tutorial-Autograd

Autograd

举例

Jacobian

High-Level API

AI总结

PyTorch Autograd 功能总结

1. Autograd 的核心机制

2. 关键函数与方法

3. 示例详解

示例 1：基本梯度计算

示例 2：非标量输出的梯度（需指定 gradient 参数）

示例 3：冻结部分参数（detach 或 requires_grad=False）

示例 4：高阶导数（create_graph=True）

示例 5：梯度清零与累加

4. Autograd 的注意事项

5. 动态计算图的优势

总结

示例 2：非标量输出的梯度（需指定 `gradient` 参数）

示例 3：冻结部分参数（`detach` 或 `requires_grad=False`）

示例 4：高阶导数（`create_graph=True`）