1. Introduction
1.1 Preface
本系列博文是和鲸社区的活动《20天吃掉那只PyTorch》学习的笔记,本篇为系列笔记的第三篇—— Pytorch
的层次结构。该专栏是 Github
上 2.8K
星的项目,在学习该书的过程中可以参考阅读《Python深度学习》一书的第一部分"深度学习基础"内容。
《Python深度学习》这本书是 Keras
之父 Francois Chollet
所著,该书假定读者无任何机器学习知识,以Keras
为工具,使用丰富的范例示范深度学习的最佳实践,该书通俗易懂,全书没有一个数学公式,注重培养读者的深度学习直觉。
《Python深度学习》一书的第一部分的 4
个章节内容如下,预计读者可以在 20
小时之内学完。
- 什么是深度学习
- 神经网络的数学基础
- 神经网络入门
- 机器学习基础
本系列博文的大纲如下:
- 一、PyTorch的建模流程
- 二、PyTorch的核心概念
- 三、PyTorch的层次结构
- 四、PyTorch的低阶API
- 五、PyTorch的中阶API
- 六、PyTorch的高阶API
最后,本博文提供所使用的全部数据,读者可以从下述连接中下载数据:
1.2 Pytorch 的层次结构
本章我们介绍 Pytorch
中 5
个不同的层次结构:
- 硬件层
- 内核层
- 低阶
API
- 中阶 `API ``
- 高阶
API
【torchkeras
】
并以线性回归和 DNN
二分类模型为例,直观对比展示在不同层级实现模型的特点。
Pytorch
的层次结构从低到高可以分成如下五层:
-
最底层为硬件层,
Pytorch
支持CPU
、GPU
加入计算资源池; -
第二层为
C++
实现的内核; -
第三层为
Python
实现的操作符,提供了封装C++
内核的低级API指令,主要包括各种张量操作算子、自动微分、变量管理;如
torch.tensor
,torch.cat
,torch.autograd.grad
,nn.Module
。如果把模型比作一个房子,那么第三层API
就是【模型之砖】。 -
第四层为
Python
实现的模型组件,对低级API
进行了函数封装,主要包括各种模型层,损失函数,优化器,数据管道等等。如
torch.nn.Linear
,torch.nn.BCE
,torch.optim.Adam
,torch.utils.data.DataLoader
。如果把模型比作一个房子,那么第四层API就是【模型之墙】。 -
第五层为
Python
实现的模型接口。Pytorch
没有官方的高阶API。为了便于训练模型,作者仿照keras
中的模型接口,使用了不到300
行代码,封装了Pytorch的高阶模型接口torchkeras.Model
。如果把模型比作一个房子,那么第五层API就是模型本身,即【模型之屋】。
2. 低阶 API
示范
下面的范例使用 Pytorch
的低阶 API
实现线性回归模型和 DNN
二分类模型。低阶 API
主要包括张量操作,计算图和自动微分。
2.1 Linear regression
2.1.1 Prepare data
1 | import numpy as np |
-
数据可视化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18# 数据可视化
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.figure(figsize = (12,5))
ax1 = plt.subplot(121)
ax1.scatter(X[:,0].numpy(),Y[:,0].numpy(), c = "b",label = "samples")
ax1.legend()
plt.xlabel("x1")
plt.ylabel("y",rotation = 0)
ax2 = plt.subplot(122)
ax2.scatter(X[:,1].numpy(),Y[:,0].numpy(), c = "g",label = "samples")
ax2.legend()
plt.xlabel("x2")
plt.ylabel("y",rotation = 0)
plt.show()Results:
-
构建数据管道迭代器
1
2
3
4
5
6
7
8
9
10
11
12
13
14# 构建数据管道迭代器
def data_iter(features, labels, batch_size=8):
num_examples = len(features)
indices = list(range(num_examples))
np.random.shuffle(indices) #样本的读取顺序是随机的
for i in range(0, num_examples, batch_size):
indexs = torch.LongTensor(indices[i: min(i + batch_size, num_examples)])
yield features.index_select(0, indexs), labels.index_select(0, indexs)
# 测试数据管道效果
batch_size = 8
(features,labels) = next(data_iter(X,Y,batch_size))
print(features)
print(labels)Result:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16tensor([[ 1.9428, -0.7624],
[-2.5625, 0.4411],
[-0.7651, -3.8922],
[ 3.1022, -2.6201],
[-1.0578, -2.6963],
[-1.9720, 3.8035],
[-3.4711, -2.4106],
[-0.6102, 2.6127]])
tensor([[11.7814],
[ 6.0209],
[23.4428],
[22.5369],
[17.8275],
[-8.7643],
[ 7.5050],
[ 0.5841]])
2.2 Model
2.2.1 Define model
1 | # 定义模型 |
2.2.2 Training model
1 | def train_step(model, features, labels): |
-
测试
train_step
效果1
2
3batch_size = 10
(features,labels) = next(data_iter(X,Y,batch_size))
train_step(model,features,labels)Results:
1
tensor(68.6391, grad_fn=<MeanBackward0>)
-
Train model
1
2
3
4
5
6
7
8
9
10
11def train_model(model,epochs):
for epoch in range(1,epochs+1):
for features, labels in data_iter(X,Y,10):
loss = train_step(model,features,labels)
if epoch%200==0:
print("epoch =",epoch,"loss = ",loss.item())
print("model.w =",model.w.data)
print("model.b =",model.b.data)
train_model(model,epochs = 1000)Results:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20epoch = 200 loss = 3.2508397102355957
model.w = tensor([[ 2.0401],
[-2.9877]])
model.b = tensor([[9.9169]])
epoch = 400 loss = 3.0016872882843018
model.w = tensor([[ 2.0435],
[-2.9855]])
model.b = tensor([[9.9173]])
epoch = 600 loss = 2.7006335258483887
model.w = tensor([[ 2.0418],
[-2.9843]])
model.b = tensor([[9.9174]])
epoch = 800 loss = 1.280609369277954
model.w = tensor([[ 2.0416],
[-2.9869]])
model.b = tensor([[9.9169]])
epoch = 1000 loss = 2.169107675552368
model.w = tensor([[ 2.0420],
[-2.9852]])
model.b = tensor([[9.9170]])
2.2.3 Visualization
1 | # 结果可视化 |
Results:

2.3 DNN二分类模型
2.3.1 Prepare data
1 | import numpy as np |
Results:

-
构建数据管道迭代器
1
2
3
4
5
6
7
8
9
10
11
12
13
14# 构建数据管道迭代器
def data_iter(features, labels, batch_size=8):
num_examples = len(features)
indices = list(range(num_examples))
np.random.shuffle(indices) #样本的读取顺序是随机的
for i in range(0, num_examples, batch_size):
indexs = torch.LongTensor(indices[i: min(i + batch_size, num_examples)])
yield features.index_select(0, indexs), labels.index_select(0, indexs)
# 测试数据管道效果
batch_size = 8
(features,labels) = next(data_iter(X,Y,batch_size))
print(features)
print(labels)Results:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16tensor([[ 6.3216, -2.6834],
[ 2.4433, 4.4928],
[ 8.5585, 3.0958],
[-1.0328, 3.3381],
[-4.6885, -0.1144],
[ 8.7589, -3.4486],
[ 0.4830, 3.6482],
[ 4.9465, 0.3443]])
tensor([[0.],
[1.],
[0.],
[1.],
[1.],
[0.],
[1.],
[1.]])
2.3.2 Define model
此处范例我们利用 nn.Module
来组织模型变量。
1 | class DNNModel(nn.Module): |
-
测试模型结构
1
2
3
4
5
6
7
8
9
10
11# 测试模型结构
batch_size = 10
(features,labels) = next(data_iter(X,Y,batch_size))
predictions = model(features)
loss = model.loss_func(labels,predictions)
metric = model.metric_func(labels,predictions)
print("init loss:", loss.item())
print("init metric:", metric.item())Results:
1
2init loss: 7.446216583251953
init metric: 0.53620088100433351
len(list(model.parameters()))
Results:
1
6
2.3.3 Trianing model
1 | def train_step(model, features, labels): |
Results:
epoch = 100 loss = 0.1934373697731644 metric = 0.9207499933242798
epoch = 200 loss = 0.18901969484053552 metric = 0.918999993801117
epoch = 300 loss = 0.18451461097225547 metric = 0.9247499924898147
epoch = 400 loss = 0.18301934767514466 metric = 0.9247499933838844
epoch = 500 loss = 0.18300161071121693 metric = 0.9274999922513962
epoch = 600 loss = 0.18265636594966053 metric = 0.9219999933242797
epoch = 700 loss = 0.18221229410730302 metric = 0.9239999923110008
epoch = 800 loss = 0.1817048901133239 metric = 0.922749992609024
epoch = 900 loss = 0.18160937033127994 metric = 0.9259999924898148
epoch = 1000 loss = 0.1799963693227619 metric = 0.9282499927282334
2.3.4 Visualization
1 | # 结果可视化 |
Results:

3. 中阶API示范
下面的范例使用 Pytorch
的中阶 API
实现线性回归模型和和 DNN
二分类模型。
Pytorch
的中阶 API
主要包括:
- 各种模型层;
- 损失函数;
- 优化器;
- 数据管道等。
3.1 Linear regression
3.1.1 Prepare data
1 | import numpy as np |
-
Visualization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18# 数据可视化
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.figure(figsize = (12,5))
ax1 = plt.subplot(121)
ax1.scatter(X[:,0],Y[:,0], c = "b",label = "samples")
ax1.legend()
plt.xlabel("x1")
plt.ylabel("y",rotation = 0)
ax2 = plt.subplot(122)
ax2.scatter(X[:,1],Y[:,0], c = "g",label = "samples")
ax2.legend()
plt.xlabel("x2")
plt.ylabel("y",rotation = 0)
plt.show()Results:
-
Data pipeline
1
2
3#构建输入数据管道
ds = TensorDataset(X,Y)
dl = DataLoader(ds,batch_size = 10,shuffle=True,num_workers=2)
3.2 Model
3.2.1 Define model
1 | model = nn.Linear(2,1) #线性层 |
3.2.2 Training model
-
Train step
1
2
3
4
5
6
7
8
9
10
11
12def train_step(model, features, labels):
predictions = model(features)
loss = model.loss_func(predictions,labels)
loss.backward()
model.optimizer.step()
model.optimizer.zero_grad()
return loss.item()
# 测试train_step效果
features,labels = next(iter(dl))
train_step(model,features,labels)Results:
1
415.08831787109375
-
Train model
1
2
3
4
5
6
7
8
9
10
11
12def train_model(model,epochs):
for epoch in range(1,epochs+1):
for features, labels in dl:
loss = train_step(model,features,labels)
if epoch%50==0:
w = model.state_dict()["weight"]
b = model.state_dict()["bias"]
print("epoch =",epoch,"loss = ",loss)
print("w =",w)
print("b =",b)
train_model(model,epochs = 200)Results:
1
2
3
4
5
6
7
8
9
10
11
12epoch = 50 loss = 4.598311901092529
w = tensor([[ 1.9602, -2.9793]])
b = tensor([10.1778])
epoch = 100 loss = 3.397813320159912
w = tensor([[ 2.0284, -2.9681]])
b = tensor([10.2230])
epoch = 150 loss = 1.588686227798462
w = tensor([[ 1.9387, -2.9690]])
b = tensor([10.1770])
epoch = 200 loss = 4.254576206207275
w = tensor([[ 1.8670, -3.1228]])
b = tensor([10.2100])
3.2.3 Visualization
1 | w,b = model.state_dict()["weight"],model.state_dict()["bias"] |
Results:

3.3 DNN二分类模型
3.3.1 Prepare data
1 | import numpy as np |
Results:

- Pipeline
1 | #构建输入数据管道 |
3.3.2 Define model
1 | class DNNModel(nn.Module): |
-
Test pipeline
1
2
3
4
5
6
7
8
9# 测试模型结构
(features,labels) = next(iter(dl))
predictions = model(features)
loss = model.loss_func(predictions,labels)
metric = model.metric_func(predictions,labels)
print("init loss:",loss.item())
print("init metric:",metric.item())Results:
1
2init loss: 0.8217536807060242
init metric: 0.6000000238418579
3.3.3 Training model
-
Train step
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19def train_step(model, features, labels):
# 正向传播求损失
predictions = model(features)
loss = model.loss_func(predictions,labels)
metric = model.metric_func(predictions,labels)
# 反向传播求梯度
loss.backward()
# 更新模型参数
model.optimizer.step()
model.optimizer.zero_grad()
return loss.item(),metric.item()
# 测试train_step效果
features,labels = next(iter(dl))
train_step(model,features,labels)Results:
1
(1.027471899986267, 0.4000000059604645)
-
Train model
1
2
3
4
5
6
7
8
9
10
11
12
13
14def train_model(model,epochs):
for epoch in range(1,epochs+1):
loss_list,metric_list = [],[]
for features, labels in dl:
lossi,metrici = train_step(model,features,labels)
loss_list.append(lossi)
metric_list.append(metrici)
loss = np.mean(loss_list)
metric = np.mean(metric_list)
if epoch%100==0:
print("epoch =",epoch,"loss = ",loss,"metric = ",metric)
train_model(model,epochs = 300)Results:
1
2
3epoch = 100 loss = 0.2738241909684248 metric = 0.9302499929070472
epoch = 200 loss = 0.27702247152624065 metric = 0.9312499925494194
epoch = 300 loss = 0.27914922587944946 metric = 0.9309999929368495
3.3.4 Visualization
1 | # 结果可视化 |
Results:

4. 高阶API示范
Pytorch
没有官方的高阶 API
,一般需要用户自己实现训练循环、验证循环、和预测循环。
torchkeras.Model
类是仿照 tf.keras.Model
的功能对 Pytorch
的 nn.Module
进行了封装设计而成的,它实现了 fit
, validate
,predict
, summary
方法,相当于用户自定义高阶 API
。本章后面的内容借助它来实现线性回归模型。
此外,torchkeras.LightModel
类是借用 pytorch_lightning
的功能,封装了类Keras
接口的另外一种实现。本章后面的内容用它实现DNN二分类模型。
4.1 Linear regression
4.1.1 Prepare data
1 | import numpy as np |
-
Visualization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18# 数据可视化
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.figure(figsize = (12,5))
ax1 = plt.subplot(121)
ax1.scatter(X[:,0],Y[:,0], c = "b",label = "samples")
ax1.legend()
plt.xlabel("x1")
plt.ylabel("y",rotation = 0)
ax2 = plt.subplot(122)
ax2.scatter(X[:,1],Y[:,0], c = "g",label = "samples")
ax2.legend()
plt.xlabel("x2")
plt.ylabel("y",rotation = 0)
plt.show()Results:
-
Data pipeline
1
2
3
4
5#构建输入数据管道
ds = TensorDataset(X,Y)
ds_train,ds_valid = torch.utils.data.random_split(ds,[int(400*0.7),400-int(400*0.7)])
dl_train = DataLoader(ds_train,batch_size = 10,shuffle=True,num_workers=2)
dl_valid = DataLoader(ds_valid,batch_size = 10,num_workers=2)
4.2 Model
4.2.1 Define model
1 | # 继承用户自定义模型 |
4.2.2 Training model
1 | # 使用fit方法进行训练 |
Results:
Start Training ...
================================================================================2022-02-06 22:48:10
{'step': 20, 'loss': 208.126, 'mae': 11.994, 'mape': 1.195}
+-------+---------+--------+-------+----------+---------+----------+
| epoch | loss | mae | mape | val_loss | val_mae | val_mape |
+-------+---------+--------+-------+----------+---------+----------+
| 1 | 201.175 | 11.695 | 1.269 | 195.057 | 11.834 | 1.065 |
+-------+---------+--------+-------+----------+---------+----------+
...
+-------+-------+-------+-------+----------+---------+----------+
| epoch | loss | mae | mape | val_loss | val_mae | val_mape |
+-------+-------+-------+-------+----------+---------+----------+
| 20 | 39.91 | 5.993 | 1.649 | 42.392 | 6.193 | 1.032 |
+-------+-------+-------+-------+----------+---------+----------+
================================================================================2022-02-06 22:49:56
Finished Training...
4.2.3 Visualization
1 | w,b = model.state_dict()["fc.weight"],model.state_dict()["fc.bias"] |
Results:

4.2.4 Evaluation
1 | dfhistory.tail() |
Results:
loss | mae | mape | val_loss | val_mae | val_mape | |
---|---|---|---|---|---|---|
15 | 51.618867 | 6.840317 | 1.773152 | 54.423827 | 7.038455 | 1.124349 |
16 | 48.355738 | 6.618555 | 1.744567 | 51.134396 | 6.821975 | 1.102371 |
17 | 45.444238 | 6.420669 | 1.726280 | 47.896852 | 6.605719 | 1.086570 |
18 | 42.519069 | 6.199411 | 1.682794 | 45.115399 | 6.398358 | 1.055073 |
19 | 39.909953 | 5.992503 | 1.649152 | 42.391730 | 6.192853 | 1.031992 |
1 | import matplotlib.pyplot as plt |
Results:

1 | plot_metric(dfhistory,"mape") |
Results:

1 | # 评估 |
Results:
{'val_loss': 42.391730308532715,
'val_mae': 6.19285261631012,
'val_mape': 1.0319924702246983}
4.2.5 Predict
1 | # 预测 |
Results:
tensor([[ 8.9128],
[ 9.5116],
[ 12.2481],
[ 0.1308],
[ 16.1116],
[-17.9351],
[-14.6407],
[ 2.9675],
[ 10.9686],
[ 14.8227]])
-
Predict validate data
1
2# 预测
model.predict(dl_valid)[0:10]Results:
tensor([[ -4.9393], [-12.2253], [ 3.5050], [ 6.6128], [ 2.7707], [ 0.7076], [ -6.2700], [ -8.4491], [ -7.4038], [ 10.0306]])
4.3 DNN二分类模型
4.3.1 Prepare data
1 | import numpy as np |
Results:

-
Dataloader
1
2
3
4
5ds = TensorDataset(X,Y)
ds_train,ds_valid = torch.utils.data.random_split(ds,[int(len(ds)*0.7),len(ds)-int(len(ds)*0.7)])
dl_train = DataLoader(ds_train,batch_size = 100,shuffle=True,num_workers=2)
dl_valid = DataLoader(ds_valid,batch_size = 100,num_workers=2)
4.3.2 Define model
1 | import torchmetrics as metrics |
Results:
Global seed set to 1234
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Linear-1 [-1, 4] 12
Linear-2 [-1, 8] 40
Linear-3 [-1, 1] 9
================================================================
Total params: 61
Trainable params: 61
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.000008
Forward/backward pass size (MB): 0.000099
Params size (MB): 0.000233
Estimated Total Size (MB): 0.000340
----------------------------------------------------------------
4.3.3 Training model
Note:下述代码,如果本机没有 gpu
会报 Runerror
错误:
1 | RuntimeError: DataLoader worker (pid(s) 6088, 19424) exited unexpectedl |
将 gpu=0
删掉能避免此错误。
1 | ckpt_cb = pl.callbacks.ModelCheckpoint(monitor='val_loss') |
Results:
1 | GPU available: False, used: False |
4.3.4 Visualization
1 | # 结果可视化 |
Results:

4.3.5 Evaluation
1 | import pandas as pd |
Results:
val_loss | val_acc | loss | acc | epoch | |
---|---|---|---|---|---|
0 | 0.659258 | 0.548333 | 0.679371 | 0.532500 | 0 |
1 | 0.633105 | 0.712500 | 0.653128 | 0.617500 | 1 |
2 | 0.560715 | 0.705833 | 0.603827 | 0.702857 | 2 |
3 | 0.468437 | 0.794167 | 0.533967 | 0.737143 | 3 |
4 | 0.345662 | 0.820000 | 0.427476 | 0.795357 | 4 |
... | ... | ... | ... | ... | ... |
95 | 0.202806 | 0.918333 | 0.202421 | 0.921071 | 95 |
96 | 0.202806 | 0.918333 | 0.202421 | 0.921071 | 96 |
97 | 0.202806 | 0.918333 | 0.202421 | 0.921071 | 97 |
98 | 0.202806 | 0.918333 | 0.202421 | 0.921071 | 98 |
99 | 0.202806 | 0.918333 | 0.202421 | 0.921071 | 99 |
100 rows × 5 columns
1 | import matplotlib.pyplot as plt |
Results:

1 | plot_metric(dfhistory,"acc") |
Results:

1 | results = trainer.test(model, test_dataloaders=dl_valid, verbose = False) |
Results:
1 | Testing: 0it [00:00, ?it/s] |
4.3.6 Predict
1 | def predict(model,dl): |
Results:
tensor([[1.],
[1.],
[0.],
...,
[0.],
[0.],
[1.]])