1. Introduction

1.1 Preface

本系列博文是和鲸社区的活动《20天吃掉那只PyTorch》学习的笔记，本篇为系列笔记的第四篇—— Pytorch 的低阶 API。该专栏是 Github 上 2.8K 星的项目，在学习该书的过程中可以参考阅读《Python深度学习》一书的第一部分"深度学习基础"内容。

《Python深度学习》这本书是 Keras 之父 Francois Chollet 所著，该书假定读者无任何机器学习知识，以Keras 为工具，使用丰富的范例示范深度学习的最佳实践，该书通俗易懂，全书没有一个数学公式，注重培养读者的深度学习直觉。

《Python深度学习》一书的第一部分的 4 个章节内容如下，预计读者可以在 20 小时之内学完。

什么是深度学习
神经网络的数学基础
神经网络入门
机器学习基础

本系列博文的大纲如下：

一、PyTorch的建模流程
二、PyTorch的核心概念
三、PyTorch的层次结构
四、PyTorch的低阶API
五、PyTorch的中阶API
六、PyTorch的高阶API

最后，本博文提供所使用的全部数据，读者可以从下述连接中下载数据：

Download Now

1.2 Pytorch的低阶API

Pytorch 的低阶 API 主要包括张量操作，动态计算图和自动微分。

如果把模型比作一个房子，那么低阶API就是【模型之砖】。在低阶 API 层次上，可以把 Pytorch 当做一个增强版的 numpy 来使用。Pytorch 提供的方法比 numpy 更全面，运算速度更快，如果需要的话，还可以使用GPU 进行加速。

前面几章我们对低阶 API 已经有了一个整体的认识，本章我们将重点详细介绍张量操作和动态计算图。

张量的操作主要包括 张量的结构操作 和 张量的数学运算。

张量结构操作诸如：

张量创建；
索引切片；
维度变换；
合并分割。

张量数学运算主要有：

标量运算；
向量运算；
矩阵运算。

另外我们会介绍张量运算的广播机制。动态计算图我们将主要介绍动态计算图的特性，计算图中的 Function，计算图与反向传播。

2. Operation of tensor

2.1 Create tensor

张量创建的许多方法和 numpy 中创建 array 的方法很像。

From list

import numpy as np
import torch

a = torch.tensor([1,2,3],dtype = torch.float)
print(a)

Results：

  tensor([1., 2., 3.])

arange()

1 2	b = torch.arange(1,10,step = 2) print(b)

Results:

  tensor([1, 3, 5, 7, 9])

等差递增

1 2	c = torch.linspace(0.0,2*3.14,10) print(c)

Results:

  tensor([0.0000, 0.6978, 1.3956, 2.0933, 2.7911, 3.4889, 4.1867, 4.8844, 5.5822, 6.2800])

zeros()

1 2	d = torch.zeros((3,3)) print(d)

Results:

  tensor([[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]])

ones()

a = torch.ones((3,3),dtype = torch.int)
b = torch.zeros_like(a,dtype = torch.float)
print(a)
print(b)

Results:

  tensor([[1, 1, 1],
          [1, 1, 1],
          [1, 1, 1]], dtype=torch.int32)
  tensor([[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]])

fill_()

1 2	torch.fill_(b,5) print(b)

Results:

  tensor([[5., 5., 5.],
          [5., 5., 5.],
          [5., 5., 5.]])

rand()

#均匀随机分布
torch.manual_seed(0)
minval,maxval = 0,10
a = minval + (maxval-minval)*torch.rand([5])
print(a)

Results:

  tensor([4.9626, 7.6822, 0.8848, 1.3203, 3.0742])

nornal()

1
2
3

# 正态分布随机
b = torch.normal(mean = torch.zeros(3,3), std = torch.ones(3,3))
print(b)

Results:

  tensor([[ 0.5507,  0.2704,  0.6472],
          [ 0.2490, -0.3354,  0.4564],
          [-0.6255,  0.4539, -1.3740]])

randn()

# 标准正态分布随机
mean,std = 2,5
c = std*torch.randn((3,3))+mean
print(c)

Results:

  tensor([[-4.9181, -0.6526, -0.1565],
          [-9.4288,  2.3498,  5.3368],
          [-3.2017, -0.4098,  7.6465]])

randperm()

1
2
3

# 整数随机排列
d = torch.randperm(20)
print(d)

Results:

  tensor([16,  1, 10, 11, 19,  4,  9,  6,  5,  2, 13, 18,  3,  7, 15, 17, 12,  0, 14,  8])

特殊矩阵

# 单位矩阵
I = torch.eye(3,3)
print(I)

# 对角矩阵
t = torch.diag(torch.tensor([1,2,3]))
print(t)

Results:

  tensor([[1., 0., 0.],
          [0., 1., 0.],
          [0., 0., 1.]])
  tensor([[1, 0, 0],
          [0, 2, 0],
          [0, 0, 3]])

2.2 Index slice

张量的索引切片方式和 numpy 几乎是一样的。切片时支持缺省参数和省略号。可以通过索引和切片对部分元素进行修改。

此外，对于不规则的切片提取,可以使用 torch.index_select, torch.masked_select, torch.take

如果要通过修改张量的某些元素得到新的张量，可以使用torch.where, torch.masked_fill, torch.index_fill。

2.2.1 规则切片

1
2
3
4
5
# 均匀随机分布
torch.manual_seed(0)
minval,maxval = 0,10
t = torch.floor(minval + (maxval-minval)*torch.rand([5,5])).int()
print(t)


Results:

    tensor([[4, 7, 0, 1, 3],
            [6, 4, 8, 4, 6],
            [3, 4, 0, 1, 2],
            [5, 6, 8, 1, 2],
            [6, 9, 3, 8, 4]], dtype=torch.int32)

第 0 行

1 2	# 第 0 行 print(t[0])

Results:

  tensor([4, 7, 0, 1, 3], dtype=torch.int32)

最后一行

1 2	# 倒数第一行 print(t[-1])

Results:

  tensor([6, 9, 3, 8, 4], dtype=torch.int32)

指定行列

1
2
3

# 第 1 行第 3 列
print(t[1,3])
print(t[1][3])

Results:

  tensor(4, dtype=torch.int32)
  tensor(4, dtype=torch.int32)

指定范围

1 2	# 第1行至第3行 print(t[1:4,:])

Results:

  tensor([[6, 4, 8, 4, 6],
          [3, 4, 0, 1, 2],
          [5, 6, 8, 1, 2]], dtype=torch.int32)

1 2	#第1行至最后一行，第0列到最后一列每隔两列取一列 print(t[1:4,:4:2])

Results:

  tensor([[6, 8],
          [3, 0],
          [5, 8]], dtype=torch.int32)

多维数组省略号切片

1 2	a = torch.arange(27).view(3,3,3) print(a)

Results:

  tensor([[[ 0,  1,  2],
           [ 3,  4,  5],
           [ 6,  7,  8]],

          [[ 9, 10, 11],
           [12, 13, 14],
           [15, 16, 17]],

          [[18, 19, 20],
           [21, 22, 23],
           [24, 25, 26]]])

1 2	#省略号可以表示多个冒号 print(a[...,1])

Results:

  tensor([[ 1,  4,  7],
          [10, 13, 16],
          [19, 22, 25]])

更改值

# 可以使用索引和切片修改部分元素
x = torch.tensor([[1,2],[3,4]],dtype = torch.float32,requires_grad=True)
x.data[1,:] = torch.tensor([0.0,0.0])
x

Results:

  tensor([[1., 2.],
          [0., 0.]], requires_grad=True)

Note：以上切片方式相对规则，对于不规则的切片提取,可以使用torch.index_select, torch.take, torch.gather, torch.masked_select.

2.2.2 不规则切片

Case：考虑班级成绩册的例子，有 4 个班级，每个班级 10 个学生，每个学生 7 门科目成绩。可以用一个 4×10×7 的张量来表示。

minval=0
maxval=100
scores = torch.floor(minval + (maxval-minval)*torch.rand([4,10,7])).int()
print(scores)

Results:

tensor([[[55, 95,  3, 18, 37, 30, 93],
         [17, 26, 15,  3, 20, 92, 72],
         [74, 52, 24, 58,  3, 13, 24],
         [81, 79, 27, 48, 81, 99, 69],
         [56, 83, 20, 59, 11, 15, 24],
         [72, 70, 20, 65, 77, 43, 51],
         [61, 81, 98, 11, 31, 69, 91],
         [93, 94, 59,  6, 54, 18,  3],
         [94, 88,  0, 59, 41, 41, 27],
         [69, 20, 68, 75, 85, 68,  0]],

        [[17, 74, 60, 10, 21, 97, 83],
         [28, 37,  2, 49, 12, 11, 47],
         [57, 29, 79, 19, 95, 84,  7],
         [37, 52, 57, 61, 69, 52, 25],
         [73,  2, 20, 37, 25, 32,  9],
         [39, 60, 17, 47, 85, 44, 51],
         [45, 60, 81, 97, 81, 97, 46],
         [ 5, 26, 84, 49, 25, 11,  3],
         [ 7, 39, 77, 77,  1, 81, 10],
         [39, 29, 40, 40,  5,  6, 42]],

        [[50, 27, 68,  4, 46, 93, 29],
         [95, 68,  4, 81, 44, 27, 89],
         [ 9, 55, 39, 85, 63, 74, 67],
         [37, 39,  8, 77, 89, 84, 14],
         [52, 14, 22, 20, 67, 20, 48],
         [52, 82, 12, 15, 20, 84, 32],
         [92, 68, 56, 49, 40, 56, 38],
         [49, 56, 10, 23, 90,  9, 46],
         [99, 68, 51,  6, 74, 14, 35],
         [33, 42, 50, 91, 56, 94, 80]],

        [[18, 72, 14, 28, 64, 66, 87],
         [33, 50, 75,  1, 86,  8, 50],
         [41, 23, 56, 91, 35, 20, 31],
         [ 0, 72, 25, 16, 21, 78, 76],
         [88, 68, 33, 36, 64, 91, 63],
         [26, 26,  2, 60, 21,  5, 93],
         [17, 44, 64, 51, 16,  9, 89],
         [58, 91, 33, 64, 38, 47, 19],
         [66, 65, 48, 38, 19, 84, 12],
         [70, 33, 25, 58, 24, 61, 59]]], dtype=torch.int32)

抽取每个班级第 0 个学生，第 5 个学生，第 9 个学生的全部成绩

1 2	# 抽取每个班级第0个学生，第5个学生，第9个学生的全部成绩 torch.index_select(scores,dim = 1,index = torch.tensor([0,5,9]))

Results：

 tensor([[[55, 95,  3, 18, 37, 30, 93],
          [72, 70, 20, 65, 77, 43, 51],
          [69, 20, 68, 75, 85, 68,  0]],

         [[17, 74, 60, 10, 21, 97, 83],
          [39, 60, 17, 47, 85, 44, 51],
          [39, 29, 40, 40,  5,  6, 42]],

         [[50, 27, 68,  4, 46, 93, 29],
          [52, 82, 12, 15, 20, 84, 32],
          [33, 42, 50, 91, 56, 94, 80]],

         [[18, 72, 14, 28, 64, 66, 87],
          [26, 26,  2, 60, 21,  5, 93],
          [70, 33, 25, 58, 24, 61, 59]]], dtype=torch.int32)

抽取每个班级第 0 个学生，第 5 个学生，第 9 个学生的第 1 门课程，第 3 门课程，第 6 门课程成绩

#抽取每个班级第0个学生，第5个学生，第9个学生的第1门课程，第3门课程，第6门课程成绩
q = torch.index_select(torch.index_select(scores,dim = 1,index = torch.tensor([0,5,9]))
                   ,dim=2,index = torch.tensor([1,3,6]))
print(q)

Results:

 tensor([[[95, 18, 93],
          [70, 65, 51],
          [20, 75,  0]],

         [[74, 10, 83],
          [60, 47, 51],
          [29, 40, 42]],

         [[27,  4, 29],
          [82, 15, 32],
          [42, 91, 80]],

         [[72, 28, 87],
          [26, 60, 93],
          [33, 58, 59]]], dtype=torch.int32)

抽取第 0 个班级第 0 个学生的第 0 门课程，第 2 个班级的第 4 个学生的第 1 门课程，第 3 个班级的第 9 个学生第 6 门课程成绩（take 将输入看成一维数组，输出和 index 同形状）

#抽取第0个班级第0个学生的第0门课程，第2个班级的第4个学生的第1门课程，第3个班级的第9个学生第6门课程成绩
#take将输入看成一维数组，输出和index同形状
s = torch.take(scores,torch.tensor([0*10*7+0,2*10*7+4*7+1,3*10*7+9*7+6]))
s

Results:

 tensor([55, 14, 59], dtype=torch.int32)

抽取分数大于等于 80 分的分数（布尔索引）

#抽取分数大于等于80分的分数（布尔索引）
#结果是1维张量
g = torch.masked_select(scores,scores>=80)
print(g)

Results:

 tensor([95, 93, 92, 81, 81, 99, 83, 81, 98, 91, 93, 94, 94, 88, 85, 97, 83, 95,
         84, 85, 81, 97, 81, 97, 84, 81, 93, 95, 81, 89, 85, 89, 84, 82, 84, 92,
         90, 99, 91, 94, 80, 87, 86, 91, 88, 91, 93, 89, 91, 84],
        dtype=torch.int32)

以上这些方法仅能提取张量的部分元素值，但不能更改张量的部分元素值得到新的张量。如果要通过修改张量的部分元素值得到新的张量，可以使用torch.where, torch.index_fill 和 torch.masked_fill。

torch.where 可以理解为 if 的张量版本。
torch.index_fill 的选取元素逻辑和 torch.index_select 相同。
torch.masked_fill 的选取元素逻辑和 torch.masked_select 相同。

将每个班级第 0 个学生，第 5 个学生，第 9 个学生的全部成绩赋值成满分

1
2
3

# 将每个班级第0个学生，第5个学生，第9个学生的全部成绩赋值成满分
torch.index_fill(scores,dim = 1,index = torch.tensor([0,5,9]),value = 100)
# 等价于 scores.index_fill(dim = 1,index = torch.tensor([0,5,9]),value = 100)

Results:

 tensor([[[100, 100, 100, 100, 100, 100, 100],
          [ 17,  26,  15,   3,  20,  92,  72],
          [ 74,  52,  24,  58,   3,  13,  24],
          [ 81,  79,  27,  48,  81,  99,  69],
          [ 56,  83,  20,  59,  11,  15,  24],
          [100, 100, 100, 100, 100, 100, 100],
          [ 61,  81,  98,  11,  31,  69,  91],
          [ 93,  94,  59,   6,  54,  18,   3],
          [ 94,  88,   0,  59,  41,  41,  27],
          [100, 100, 100, 100, 100, 100, 100]],

         [[100, 100, 100, 100, 100, 100, 100],
          [ 28,  37,   2,  49,  12,  11,  47],
          [ 57,  29,  79,  19,  95,  84,   7],
          [ 37,  52,  57,  61,  69,  52,  25],
          [ 73,   2,  20,  37,  25,  32,   9],
          [100, 100, 100, 100, 100, 100, 100],
          [ 45,  60,  81,  97,  81,  97,  46],
          [  5,  26,  84,  49,  25,  11,   3],
          [  7,  39,  77,  77,   1,  81,  10],
          [100, 100, 100, 100, 100, 100, 100]],

         [[100, 100, 100, 100, 100, 100, 100],
          [ 95,  68,   4,  81,  44,  27,  89],
          [  9,  55,  39,  85,  63,  74,  67],
          [ 37,  39,   8,  77,  89,  84,  14],
          [ 52,  14,  22,  20,  67,  20,  48],
          [100, 100, 100, 100, 100, 100, 100],
          [ 92,  68,  56,  49,  40,  56,  38],
          [ 49,  56,  10,  23,  90,   9,  46],
          [ 99,  68,  51,   6,  74,  14,  35],
          [100, 100, 100, 100, 100, 100, 100]],

         [[100, 100, 100, 100, 100, 100, 100],
          [ 33,  50,  75,   1,  86,   8,  50],
          [ 41,  23,  56,  91,  35,  20,  31],
          [  0,  72,  25,  16,  21,  78,  76],
          [ 88,  68,  33,  36,  64,  91,  63],
          [100, 100, 100, 100, 100, 100, 100],
          [ 17,  44,  64,  51,  16,   9,  89],
          [ 58,  91,  33,  64,  38,  47,  19],
          [ 66,  65,  48,  38,  19,  84,  12],
          [100, 100, 100, 100, 100, 100, 100]]], dtype=torch.int32)

将分数小于 60 分的分数赋值成 60 分

# 将分数小于60分的分数赋值成60分
b = torch.masked_fill(scores,scores<60,60)
# 等价于b = scores.masked_fill(scores<60,60)
b

Results:

 tensor([[[60, 95, 60, 60, 60, 60, 93],
          [60, 60, 60, 60, 60, 92, 72],
          [74, 60, 60, 60, 60, 60, 60],
          [81, 79, 60, 60, 81, 99, 69],
          [60, 83, 60, 60, 60, 60, 60],
          [72, 70, 60, 65, 77, 60, 60],
          [61, 81, 98, 60, 60, 69, 91],
          [93, 94, 60, 60, 60, 60, 60],
          [94, 88, 60, 60, 60, 60, 60],
          [69, 60, 68, 75, 85, 68, 60]],
 
         [[60, 74, 60, 60, 60, 97, 83],
          [60, 60, 60, 60, 60, 60, 60],
          [60, 60, 79, 60, 95, 84, 60],
          [60, 60, 60, 61, 69, 60, 60],
          [73, 60, 60, 60, 60, 60, 60],
          [60, 60, 60, 60, 85, 60, 60],
          [60, 60, 81, 97, 81, 97, 60],
          [60, 60, 84, 60, 60, 60, 60],
          [60, 60, 77, 77, 60, 81, 60],
          [60, 60, 60, 60, 60, 60, 60]],
 
         [[60, 60, 68, 60, 60, 93, 60],
          [95, 68, 60, 81, 60, 60, 89],
          [60, 60, 60, 85, 63, 74, 67],
          [60, 60, 60, 77, 89, 84, 60],
          [60, 60, 60, 60, 67, 60, 60],
          [60, 82, 60, 60, 60, 84, 60],
          [92, 68, 60, 60, 60, 60, 60],
          [60, 60, 60, 60, 90, 60, 60],
          [99, 68, 60, 60, 74, 60, 60],
          [60, 60, 60, 91, 60, 94, 80]],
 
         [[60, 72, 60, 60, 64, 66, 87],
          [60, 60, 75, 60, 86, 60, 60],
          [60, 60, 60, 91, 60, 60, 60],
          [60, 72, 60, 60, 60, 78, 76],
          [88, 68, 60, 60, 64, 91, 63],
          [60, 60, 60, 60, 60, 60, 93],
          [60, 60, 64, 60, 60, 60, 89],
          [60, 91, 60, 64, 60, 60, 60],
          [66, 65, 60, 60, 60, 84, 60],
          [70, 60, 60, 60, 60, 61, 60]]], dtype=torch.int32)

2.3 维度变换

维度变换相关函数主要有 torch.reshape (或者调用张量的 view 方法), torch.squeeze, torch.unsqueeze, torch.transpose：

torch.reshape 可以改变张量的形状。
torch.squeeze 可以减少维度。
torch.unsqueeze 可以增加维度。
torch.transpose 可以交换维度。

# 张量的view方法有时候会调用失败，可以使用reshape方法。

torch.manual_seed(0)
minval,maxval = 0,255
a = (minval + (maxval-minval)*torch.rand([1,3,3,2])).int()
print(a.shape)
print(a)

Results:

torch.Size([1, 3, 3, 2])
tensor([[[[126, 195],
          [ 22,  33],
          [ 78, 161]],

         [[124, 228],
          [116, 161],
          [ 88, 102]],

         [[  5,  43],
          [ 74, 132],
          [177, 204]]]], dtype=torch.int32)

reshape()

# 改成 （3,6）形状的张量
b = a.view([3,6]) #torch.reshape(a,[3,6])
print(b.shape)
print(b)

Results:

  torch.Size([3, 6])
  tensor([[126, 195,  22,  33,  78, 161],
          [124, 228, 116, 161,  88, 102],
          [  5,  43,  74, 132, 177, 204]], dtype=torch.int32)

1
2
3

# 改回成 [1,3,3,2] 形状的张量
c = torch.reshape(b,[1,3,3,2]) # b.view([1,3,3,2])
print(c)

Results:

  tensor([[[[126, 195],
            [ 22,  33],
            [ 78, 161]],

           [[124, 228],
            [116, 161],
            [ 88, 102]],

           [[  5,  43],
            [ 74, 132],
            [177, 204]]]], dtype=torch.int32)

降/升维

如果张量在某个维度上只有一个元素，利用 torch.squeeze 可以消除这个维度。torch.unsqueeze 的作用和 torch.squeeze 的作用相反。

a = torch.tensor([[1.0,2.0]])
s = torch.squeeze(a)
print(a)
print(s)
print(a.shape)
print(s.shape)

Results:

  tensor([[1., 2.]])
  tensor([1., 2.])
  torch.Size([1, 2])
  torch.Size([2])

在第 0 维插入长度为 1 的一个维度

#在第0维插入长度为1的一个维度

d = torch.unsqueeze(s,axis=0)  
print(s)
print(d)

print(s.shape)
print(d.shape)

Results:

  tensor([1., 2.])
  tensor([[1., 2.]])
  torch.Size([2])
  torch.Size([1, 2])

transpose()

torch.transpose 可以交换张量的维度，torch.transpose 常用于图片存储格式的变换上。如果是二维的矩阵，通常会调用矩阵的转置方法 matrix.t()，等价于 torch.transpose(matrix,0,1)。

minval=0
maxval=255
# Batch,Height,Width,Channel
data = torch.floor(minval + (maxval-minval)*torch.rand([100,256,256,4])).int()
print(data.shape)

# 转换成 Pytorch默认的图片格式 Batch,Channel,Height,Width
# 需要交换两次
data_t = torch.transpose(torch.transpose(data,1,2),1,3)
print(data_t.shape)

Results:

  torch.Size([100, 256, 256, 4])
  torch.Size([100, 4, 256, 256])

1
2
3

matrix = torch.tensor([[1,2,3],[4,5,6]])
print(matrix)
print(matrix.t()) #等价于torch.transpose(matrix,0,1)

Results:

  tensor([[1, 2, 3],
          [4, 5, 6]])
  tensor([[1, 4],
          [2, 5],
          [3, 6]])

2.4 合并分割

可以用 torch.cat 方法和 torch.stack 方法将多个张量合并，可以用 torch.split 方法把一个张量分割成多个张量。

2.4.1 合并

torch.cat 和 torch.stack 有略微的区别，torch.cat 是连接，不会增加维度，而 torch.stack 是堆叠，会增加维度。

cat()

a = torch.tensor([[1.0,2.0],[3.0,4.0]])
b = torch.tensor([[5.0,6.0],[7.0,8.0]])
c = torch.tensor([[9.0,10.0],[11.0,12.0]])

abc_cat = torch.cat([a,b,c],dim = 0)
print(abc_cat.shape)
print(abc_cat)

Results:

  torch.Size([6, 2])
  tensor([[ 1.,  2.],
          [ 3.,  4.],
          [ 5.,  6.],
          [ 7.,  8.],
          [ 9., 10.],
          [11., 12.]])

stack()

1
2
3

abc_stack = torch.stack([a,b,c],axis = 0) #torch中dim和axis参数名可以混用
print(abc_stack.shape)
print(abc_stack)

Results:

  torch.Size([3, 2, 2])
  tensor([[[ 1.,  2.],
           [ 3.,  4.]],

          [[ 5.,  6.],
           [ 7.,  8.]],

          [[ 9., 10.],
           [11., 12.]]])

cat 和 stack 对比

1	torch.cat([a,b,c],axis = 1)

Results:

  tensor([[ 1.,  2.,  5.,  6.,  9., 10.],
          [ 3.,  4.,  7.,  8., 11., 12.]])

1	torch.stack([a,b,c],axis = 1)

Results:

  tensor([[[ 1.,  2.],
           [ 5.,  6.],
           [ 9., 10.]],
  
          [[ 3.,  4.],
           [ 7.,  8.],
           [11., 12.]]])

2.4.2 分割

torch.split 是 torch.cat 的逆运算，可以指定分割份数平均分割，也可以通过指定每份的记录数量进行分割。

print(abc_cat)
a,b,c = torch.split(abc_cat,split_size_or_sections = 2,dim = 0) #每份2个进行分割
print(a)
print(b)
print(c)

Results:

tensor([[ 1.,  2.],
        [ 3.,  4.],
        [ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.],
        [11., 12.]])
tensor([[1., 2.],
        [3., 4.]])
tensor([[5., 6.],
        [7., 8.]])
tensor([[ 9., 10.],
        [11., 12.]])

print(abc_cat)
p,q,r = torch.split(abc_cat,split_size_or_sections =[4,1,1],dim = 0) #每份分别为[4,1,1]
print(p)
print(q)
print(r)

Results:

tensor([[ 1.,  2.],
        [ 3.,  4.],
        [ 5.,  6.],
        [ 7.,  8.],
        [ 9., 10.],
        [11., 12.]])
tensor([[1., 2.],
        [3., 4.],
        [5., 6.],
        [7., 8.]])
tensor([[ 9., 10.]])
tensor([[11., 12.]])

3. Mathematical operation of tensor

张量数学运算主要有：

标量运算；
向量运算；
矩阵运算。

本篇我们介绍张量的数学运算。

本篇文章部分内容参考如下博客：https://blog.csdn.net/duan_zhihua/article/details/82526505

3.1 Scalar operation

张量的数学运算符可以分为：

标量运算符；
向量运算符；
矩阵运算符。

加减乘除乘方，以及三角函数，指数，对数等常见函数，逻辑比较运算符等都是标量运算符，其特点是对张量实施逐元素运算。

有些标量运算符对常用的数学运算符进行了重载，并且支持类似 numpy 的广播特性。

3.1.1 Basical operator

Plus

import torch
import numpy as np

a = torch.tensor([[1.0,2],[-3,4.0]])
b = torch.tensor([[5.0,6],[7.0,8.0]])
a+b  # 运算符重载

Results:

  tensor([[ 6.,  8.],
          [ 4., 12.]])

minus

a-b

Results:

  tensor([[ -4.,  -4.],
          [-10.,  -4.]])

multiply

a*b

Results:

  tensor([[  5.,  12.],
          [-21.,  32.]])

division

a/b

Results:

  tensor([[ 0.2000,  0.3333],
          [-0.4286,  0.5000]])

power

a**2

Results:

  tensor([[ 1.,  4.],
          [ 9., 16.]])

sqrt

1 2	a**(0.5) # torch.sqrt(a)

Results:

  tensor([[1.0000, 1.4142],
          [   nan, 2.0000]])

mode

a%3 #求模

Results:

  tensor([[1., 2.],
          [-0., 1.]])

reminder

1	a//3 #地板除法

Results:

  tensor([[ 0.,  0.],
          [-1.,  1.]])

greater_equal

1	a>=2 # torch.ge(a,2) #ge: greater_equal缩写

Results:

  tensor([[False,  True],
          [False,  True]])

Logical and

1	(a>=2)&(a<=3)

Results:

  tensor([[False,  True],
          [False, False]])

Logical or

1	(a>=2)\|(a<=3)

Results：

  tensor([[True, True],
          [True, True]])

Equal

1	a==5 #torch.eq(a,5)

Results：

  tensor([[False, False],
          [False, False]])

3.1.2 Functional operator

1
2
3
4
5
6
a = torch.tensor([1.0,8.0])
b = torch.tensor([5.0,6.0])
c = torch.tensor([6.0,7.0])

print(torch.max(a,b))
print(torch.min(a,b))


Results：

    tensor([5., 8.])
    tensor([1., 6.])

取整

x = torch.tensor([2.6,-2.7])

print(torch.round(x)) #保留整数部分，四舍五入
print(torch.floor(x)) #保留整数部分，向下归整
print(torch.ceil(x))  #保留整数部分，向上归整
print(torch.trunc(x)) #保留整数部分，向0归整

Results:

  tensor([ 3., -3.])
  tensor([ 2., -3.])
  tensor([ 3., -2.])
  tensor([ 2., -2.])

取余与取模

1
2
3

x = torch.tensor([2.6,-2.7])
print(torch.fmod(x,2)) #作除法取余数
print(torch.remainder(x,2)) #作除法取剩余的部分，结果恒正

Results:

  tensor([ 0.6000, -0.7000])
  tensor([0.6000, 1.3000])

裁剪 clamp

# 幅值裁剪
x = torch.tensor([0.9,-0.8,100.0,-20.0,0.7])
y = torch.clamp(x,min=-1,max = 1)
z = torch.clamp(x,max = 1)
print(y)
print(z)

Results：

  tensor([ 0.9000, -0.8000,  1.0000, -1.0000,  0.7000])
  tensor([  0.9000,  -0.8000,   1.0000, -20.0000,   0.7000])

3.2 Vectorize operation

向量运算符只在一个特定轴上运算，将一个向量映射到一个标量或者另外一个向量。

统计值

a = torch.arange(1,10).float()
print(torch.sum(a))
print(torch.mean(a))
print(torch.max(a))
print(torch.min(a))
print(torch.prod(a)) #累乘
print(torch.std(a))  #标准差
print(torch.var(a))  #方差
print(torch.median(a)) #中位数

Results:

  tensor(45.)
  tensor(5.)
  tensor(9.)
  tensor(1.)
  tensor(362880.)
  tensor(2.7386)
  tensor(7.5000)
  tensor(5.)

指定维度计算统计值

#指定维度计算统计值
b = a.view(3,3)
print(b)
print(torch.max(b,dim = 0))
print(torch.max(b,dim = 1))

Results:

  tensor([[1., 2., 3.],
          [4., 5., 6.],
          [7., 8., 9.]])
  torch.return_types.max(
  values=tensor([7., 8., 9.]),
  indices=tensor([2, 2, 2]))
  torch.return_types.max(
  values=tensor([3., 6., 9.]),
  indices=tensor([2, 2, 2]))

cum

#cum扫描
a = torch.arange(1,10)

print(a)
print(torch.cumsum(a,0))
print(torch.cumprod(a,0))
print(torch.cummax(a,0).values) # 扫描张量中的最大值
print(torch.cummax(a,0).indices)
print(torch.cummin(a,0))

Results:

  tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
  tensor([ 1,  3,  6, 10, 15, 21, 28, 36, 45])
  tensor([     1,      2,      6,     24,    120,    720,   5040,  40320, 362880])
  tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
  tensor([0, 1, 2, 3, 4, 5, 6, 7, 8])
  torch.return_types.cummin(
  values=tensor([1, 1, 1, 1, 1, 1, 1, 1, 1]),
  indices=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0]))

sort()

torch.sort 和 torch.topk 可以对张量排序

# torch.sort 和 torch.topk 可以对张量排序
a = torch.tensor([[9,7,8],[1,3,2],[5,6,4]]).float()
print(torch.topk(a,2,dim = 0),"\n")
print(torch.topk(a,2,dim = 1),"\n")
print(torch.sort(a,dim = 1),"\n")

#利用torch.topk可以在Pytorch中实现KNN算法

Results:

  torch.return_types.topk(
  values=tensor([[9., 7., 8.],
          [5., 6., 4.]]),
  indices=tensor([[0, 0, 0],
          [2, 2, 2]]))
  
  torch.return_types.topk(
  values=tensor([[9., 8.],
          [3., 2.],
          [6., 5.]]),
  indices=tensor([[0, 2],
          [1, 2],
          [1, 0]]))
  
  torch.return_types.sort(
  values=tensor([[7., 8., 9.],
          [1., 2., 3.],
          [4., 5., 6.]]),
  indices=tensor([[1, 2, 0],
          [0, 2, 1],
          [2, 0, 1]]))

利用 torch.topk 可以在 Pytorch 中实现 KNN 算法。

3.3 Matrix operation

矩阵必须是二维的。类似 torch.tensor([1,2,3]) 这样的不是矩阵。

矩阵运算包括：

矩阵乘法
矩阵转置
矩阵逆
矩阵求迹
矩阵范数
矩阵行列式
矩阵求特征值
矩阵分解等运算

矩阵乘法

# 矩阵乘法
a = torch.tensor([[1,2],[3,4]])
b = torch.tensor([[2,0],[0,2]])
print(a@b)  #等价于torch.matmul(a,b) 或 torch.mm(a,b)

Results:

 tensor([[2, 4],
         [6, 8]])

矩阵转置

1
2
3

# 矩阵转置
a = torch.tensor([[1.0,2],[3,4]])
print(a.t())

Results:

 tensor([[1., 3.],
         [2., 4.]])

矩阵逆

1
2
3

# 矩阵逆，必须为浮点类型
a = torch.tensor([[1.0,2],[3,4]])
print(torch.inverse(a))

Results:

 tensor([[-2.0000,  1.0000],
         [ 1.5000, -0.5000]])

矩阵求迹

1
2
3

# 矩阵求trace
a = torch.tensor([[1.0,2],[3,4]])
print(torch.trace(a))

Results:

 tensor(5.)

矩阵范数

1
2
3

# 矩阵求范数
a = torch.tensor([[1.0,2],[3,4]])
print(torch.norm(a))

Results:

 tensor(5.4772)

矩阵行列式

1
2
3

# 矩阵行列式
a = torch.tensor([[1.0,2],[3,4]])
print(torch.det(a))

Results:

 tensor(-2.0000)

矩阵特征值和特征向量

# 矩阵特征值和特征向量
a = torch.tensor([[1.0,2],[-5,4]],dtype = torch.float)
print(torch.eig(a,eigenvectors=True))

# 两个特征值分别是 -2.5+2.7839j, 2.5-2.7839j

Results:

 torch.return_types.eig(
 eigenvalues=tensor([[ 2.5000,  2.7839],
         [ 2.5000, -2.7839]]),
 eigenvectors=tensor([[ 0.2535, -0.4706],
         [ 0.8452,  0.0000]]))

矩阵分解等运算

矩阵 QR 分解。将一个方阵分解为一个正交矩阵 q 和上三角矩阵 r，QR 分解实际上是对矩阵 a 实施 Schmidt 正交化得到q。

#矩阵QR分解, 将一个方阵分解为一个正交矩阵q和上三角矩阵r
#QR分解实际上是对矩阵a实施Schmidt正交化得到q

a  = torch.tensor([[1.0,2.0],[3.0,4.0]])
q,r = torch.qr(a)
print(q,"\n")
print(r,"\n")
print(q@r)

Results:

 tensor([[-0.3162, -0.9487],
         [-0.9487,  0.3162]])
 
 tensor([[-3.1623, -4.4272],
         [ 0.0000, -0.6325]])
 
 tensor([[1.0000, 2.0000],
         [3.0000, 4.0000]])

矩阵 svd 分解。svd 分解可以将任意一个矩阵分解为一个正交矩阵 u、一个对角阵 s 和一个正交矩阵 v.t() 的乘积，svd 常用于矩阵压缩和降维。

#矩阵svd分解
#svd分解可以将任意一个矩阵分解为一个正交矩阵u,一个对角阵s和一个正交矩阵v.t()的乘积
#svd常用于矩阵压缩和降维
a=torch.tensor([[1.0,2.0],[3.0,4.0],[5.0,6.0]])

u,s,v = torch.svd(a)

print(u,"\n")
print(s,"\n")
print(v,"\n")

print(u@torch.diag(s)@v.t())
#利用svd分解可以在Pytorch中实现主成分分析降维

Results:

 tensor([[-0.2298,  0.8835],
         [-0.5247,  0.2408],
         [-0.8196, -0.4019]])
 
 tensor([9.5255, 0.5143])
 
 tensor([[-0.6196, -0.7849],
         [-0.7849,  0.6196]])
 
 tensor([[1.0000, 2.0000],
         [3.0000, 4.0000],
         [5.0000, 6.0000]])

3.4 广播机制

Pytorch 的广播规则和 numpy 是一样的:

如果张量的维度不同，将维度较小的张量进行扩展，直到两个张量的维度都一样。
如果两个张量在某个维度上的长度是相同的，或者其中一个张量在该维度上的长度为 1，那么我们就说这两个张量在该维度上是相容的。
如果两个张量在所有维度上都是相容的，它们就能使用广播。
广播之后，每个维度的长度将取两个张量在该维度长度的较大值。
在任何一个维度上，如果一个张量的长度为 1，另一个张量长度大于 1，那么在该维度上，就好像是对第一个张量进行了复制。

torch.broadcast_tensors 可以将多个张量根据广播规则转换成相同的维度。

1
2
3

a = torch.tensor([1,2,3])
b = torch.tensor([[0,0,0],[1,1,1],[2,2,2]])
print(b + a)

Results:

tensor([[1, 2, 3],
        [2, 3, 4],
        [3, 4, 5]])

a_broad,b_broad = torch.broadcast_tensors(a,b)
print(a_broad,"\n")
print(b_broad,"\n")
print(a_broad + b_broad)

Results：

tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])

tensor([[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]])

tensor([[1, 2, 3],
        [2, 3, 4],
        [3, 4, 5]])

4. nn.functional 和 nn.Module

4.1 nn.functional 和 nn.ModuleLinear

前面我们介绍了 Pytorch 的张量的结构操作和数学运算中的一些常用 API。

利用这些张量的 API 我们可以构建出神经网络相关的组件(如 激活函数，模型层，损失函数)。

Pytorch 和 神经网络 相关的功能组件大多都封装在 torch.nn 模块下。这些功能组件的绝大部分既有函数形式实现，也有类形式实现。

其中 nn.functional (一般引入后改名为F)有各种功能组件的函数实现。例如：

1. 激活函数

F.relu
F.sigmoid
F.tanh
F.softmax

2. 模型层

F.linear
F.conv2d
F.max_pool2d
F.dropout2d
F.embedding

3. 损失函数

F.binary_cross_entropy
F.mse_loss
F.cross_entropy

为了便于对参数进行管理，一般通过继承 nn.Module 转换成为类的实现形式，并直接封装在 nn 模块下。例如：

4. 激活函数

nn.ReLU
nn.Sigmoid
nn.Tanh
nn.Softmax

5. 模型层

nn.Linear
nn.Conv2d
nn.MaxPool2d
nn.Dropout2d
nn.Embedding

6. 损失函数

nn.BCELoss
nn.MSELoss
nn.CrossEntropyLoss

实际上 nn.Module 除了可以管理其引用的各种参数，还可以管理其引用的子模块，功能十分强大。

4.2 使用 nn.Module 来管理参数

在 Pytorch 中，模型的参数是需要被优化器训练的，因此，通常要设置参数为 requires_grad = True 的张量。

同时，在一个模型中，往往有许多的参数，要手动管理这些参数并不是一件容易的事情。

Pytorch 一般将参数用 nn.Parameter 来表示，并且用 nn.Module 来管理其结构下的所有参数。

import torch
from torch import nn
import torch.nn.functional  as F
from matplotlib import pyplot as plt

nn.Parameter

# nn.Parameter 具有 requires_grad = True 属性
w = nn.Parameter(torch.randn(2,2))
print(w)
print(w.requires_grad)

Results:

  Parameter containing:
  tensor([[-1.6644, -0.2276],
          [-0.7033, -0.5630]], requires_grad=True)
  True

nn.ParameterList

# nn.ParameterList 可以将多个nn.Parameter组成一个列表
params_list = nn.ParameterList([nn.Parameter(torch.rand(8,i)) for i in range(1,3)])
print(params_list)
print(params_list[0].requires_grad)

Results:

  ParameterList(
      (0): Parameter containing: [torch.FloatTensor of size 8x1]
      (1): Parameter containing: [torch.FloatTensor of size 8x2]
  )
  True

nn.ParamterDict

# nn.ParameterDict 可以将多个nn.Parameter组成一个字典

params_dict = nn.ParameterDict({"a":nn.Parameter(torch.rand(2,2)),
                               "b":nn.Parameter(torch.zeros(2))})
print(params_dict)
print(params_dict["a"].requires_grad)

Results:

  ParameterDict(
      (a): Parameter containing: [torch.FloatTensor of size 2x2]
      (b): Parameter containing: [torch.FloatTensor of size 2]
  )
  True

用 Module 管理

# 可以用Module将它们管理起来
# module.parameters()返回一个生成器，包括其结构下的所有parameters

module = nn.Module()
module.w = w
module.params_list = params_list
module.params_dict = params_dict

num_param = 0
for param in module.parameters():
    print(param,"\n")
    num_param = num_param + 1
print("number of Parameters =",num_param)

Results:

Parameter containing:
tensor([[-1.6644, -0.2276],
        [-0.7033, -0.5630]], requires_grad=True)

Parameter containing:
tensor([[0.6626],
        [0.1290],
        [0.4561],
        [0.9929],
        [0.0149],
        [0.6461],
        [0.7913],
        [0.8206]], requires_grad=True)

Parameter containing:
tensor([[0.6147, 0.9396],
        [0.9375, 0.1093],
        [0.9189, 0.3000],
        [0.4675, 0.0879],
        [0.9418, 0.8860],
        [0.4496, 0.6833],
        [0.9350, 0.0427],
        [0.7650, 0.3009]], requires_grad=True)

Parameter containing:
tensor([[0.0098, 0.0561],
        [0.2361, 0.6943]], requires_grad=True)

Parameter containing:
tensor([0., 0.], requires_grad=True)

number of Parameters = 5

实践当中，一般通过继承 nn.Module 来构建模块类，并将所有含有需要学习的参数的部分放在构造函数中。

以下范例为 Pytorch 中 nn.Linear 的源码的简化版本，可以看到它将需要学习的参数放在了 __init__ 构造函数中，并在 forward 中调用 F.linear 函数来实现计算逻辑。

class Linear(nn.Module):
    __constants__ = ['in_features', 'out_features']

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

4.3 使用nn.Module来管理子模块

一般情况下，我们都很少直接使用 nn.Parameter 来定义参数构建模型，而是通过一些拼装一些常用的模型层来构造模型。这些模型层也是继承自 nn.Module 的对象,本身也包括参数，属于我们要定义的模块的子模块。

nn.Module 提供了一些方法可以管理这些子模块。

children() 方法: 返回生成器，包括模块下的所有子模块。
named_children() 方法：返回一个生成器，包括模块下的所有子模块，以及它们的名字。
modules() 方法：返回一个生成器，包括模块下的所有各个层级的模块，包括模块本身。
named_modules() 方法：返回一个生成器，包括模块下的所有各个层级的模块以及它们的名字，包括模块本身。

其中 chidren() 方法和 named_children() 方法较多使用。modules() 方法和 named_modules() 方法较少使用，其功能可以通过多个named_children() 的嵌套使用实现。

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()

        self.embedding = nn.Embedding(num_embeddings = 10000,embedding_dim = 3,padding_idx = 1)
        self.conv = nn.Sequential()
        self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
        self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_1",nn.ReLU())
        self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
        self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_2",nn.ReLU())

        self.dense = nn.Sequential()
        self.dense.add_module("flatten",nn.Flatten())
        self.dense.add_module("linear",nn.Linear(6144,1))
        self.dense.add_module("sigmoid",nn.Sigmoid())

    def forward(self,x):
        x = self.embedding(x).transpose(1,2)
        x = self.conv(x)
        y = self.dense(x)
        return y

net = Net()

i = 0

net.children()

for child in net.children():
    i+=1
    print(child,"\n")
print("child number",i)

Results:

Embedding(10000, 3, padding_idx=1)

Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
)

Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)

child number 3

net.modules()

i = 0
for module in net.modules():
    i+=1
    print(module)
print("module number:",i)

Results:

Net(
  (embedding): Embedding(10000, 3, padding_idx=1)
  (conv): Sequential(
    (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
    (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_1): ReLU()
    (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
    (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_2): ReLU()
  )
  (dense): Sequential(
    (flatten): Flatten(start_dim=1, end_dim=-1)
    (linear): Linear(in_features=6144, out_features=1, bias=True)
    (sigmoid): Sigmoid()
  )
)
Embedding(10000, 3, padding_idx=1)
Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
)
Conv1d(3, 16, kernel_size=(5,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Conv1d(16, 128, kernel_size=(2,), stride=(1,))
MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
ReLU()
Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)
Flatten(start_dim=1, end_dim=-1)
Linear(in_features=6144, out_features=1, bias=True)
Sigmoid()
module number: 13

下面我们通过 named_children 方法找到 embedding 层，并将其参数设置为不可训练(相当于冻结 embedding 层)。

children_dict = {name:module for name,module in net.named_children()}

print(children_dict)
embedding = children_dict["embedding"]
embedding.requires_grad_(False) #冻结其参数

Results:

{'embedding': Embedding(10000, 3, padding_idx=1), 'conv': Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
), 'dense': Sequential(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)}

Embedding(10000, 3, padding_idx=1)

1
2
3

for param in embedding.parameters():
    print(param.requires_grad)
    print(param.numel())

Results:

False
30000

可以看到其第一层的参数已经不可以被训练了。

1 2	from torchkeras import summary summary(net,input_shape = (200,),input_dtype = torch.LongTensor)

Results:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
         Embedding-1               [-1, 200, 3]          30,000
            Conv1d-2              [-1, 16, 196]             256
         MaxPool1d-3               [-1, 16, 98]               0
              ReLU-4               [-1, 16, 98]               0
            Conv1d-5              [-1, 128, 97]           4,224
         MaxPool1d-6              [-1, 128, 48]               0
              ReLU-7              [-1, 128, 48]               0
           Flatten-8                 [-1, 6144]               0
            Linear-9                    [-1, 1]           6,145
          Sigmoid-10                    [-1, 1]               0
================================================================
Total params: 40,625
Trainable params: 10,625
Non-trainable params: 30,000
----------------------------------------------------------------
Input size (MB): 0.000763
Forward/backward pass size (MB): 0.287796
Params size (MB): 0.154972
Estimated Total Size (MB): 0.443531
----------------------------------------------------------------

不可训练参数数量增加。

独孤诗人的学习驿站

Eat-pytorch-4-lowAPI

1. Introduction

1.1 Preface

1.2 Pytorch的低阶API

2. Operation of tensor

2.1 Create tensor

2.2 Index slice

2.2.1 规则切片

2.2.2 不规则切片

2.3 维度变换

2.4 合并分割

2.4.1 合并

2.4.2 分割

3. Mathematical operation of tensor

3.1 Scalar operation

3.1.1 Basical operator

3.1.2 Functional operator

3.2 Vectorize operation

3.3 Matrix operation

3.4 广播机制

4. nn.functional 和 nn.Module

4.1 nn.functional 和 nn.ModuleLinear

4.2 使用 nn.Module 来管理参数

4.3 使用nn.Module来管理子模块