环境配置

import paddle
import paddle.nn.functional as F
from paddle.nn import Layer
from paddle.vision.datasets import MNIST
from paddle.metric import Accuracy
from paddle.nn import Conv2D,MaxPool2D,Linear
from paddle.static import InputSpec
from paddle.vision.transforms import ToTensor

print(paddle.__version__)

环境配置参考链接:

paddle练习(一)使用线性回归预测波士顿房价_Vertira的博客-CSDN博客

数据集

手写数字的MNIST数据集,包含60,000个用于训练的示例和10,000个用于测试的示例。这些数字已经过尺寸标准化并位于图像中心,图像是固定大小(28x28像素),其值为0到1。该数据集的官方地址为:http://yann.lecun/exdb/mnist/ 本例中使用飞桨自带的mnist数据集。使用from paddle.vision.datasets import MNIST 引入即可。

数据集转化为Tensor

train_dataset = MNIST(mode='train', transform=ToTensor())
test_dataset = MNIST(mode='test', transform=ToTensor())

模型组建

这是自定义的模型网络:可以看看网络的层数和数据x的传递。把自定义的网络写成类,直接调用类对象就可以。

class MyModel(Layer):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = paddle.nn.Conv2D(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)
        self.max_pool1 = MaxPool2D(kernel_size=2, stride=2)
        self.conv2 = Conv2D(in_channels=6, out_channels=16, kernel_size=5, stride=1)
        self.max_pool2 = MaxPool2D(kernel_size=2, stride=2)
        self.linear1 = Linear(in_features=16*5*5, out_features=120)
        self.linear2 = Linear(in_features=120, out_features=84)
        self.linear3 = Linear(in_features=84, out_features=10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.max_pool1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = self.max_pool2(x)
        x = paddle.flatten(x, start_axis=1, stop_axis=-1)
        x = self.linear1(x)
        x = F.relu(x)
        x = self.linear2(x)
        x = F.relu(x)
        x = self.linear3(x)
        return x

模型训练

通过Model 构建实例,快速完成模型训练

inputs = InputSpec([None, 784], 'float32', 'x')
labels = InputSpec([None, 10], 'float32', 'x')
model = paddle.Model(MyModel(), inputs, labels)

optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())

model.prepare(
    optim,
    paddle.nn.CrossEntropyLoss(),
    Accuracy()
    )
model.fit(train_dataset,
        test_dataset,
        epochs=3,
        batch_size=64,
        save_dir='mnist_checkpoint',
        verbose=1
        )

然后直接运行,即可。运行之前肯定是自动下载数据集,可以分享我的运行结果

2.2.1
Cache file C:\Users\YANG\.cache\paddle\dataset\mnist\train-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos/mnist/train-images-idx3-ubyte.gz 
Begin to download
item 2421/2421 [============================>.] - ETA: 0s - 2ms/item
Download finished
Cache file C:\Users\YANG\.cache\paddle\dataset\mnist\train-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos/mnist/train-labels-idx1-ubyte.gz 
Begin to download
item 8/8 [============================>.] - ETA: 0s - 7ms/item
Download finished
Cache file C:\Users\YANG\.cache\paddle\dataset\mnist\t10k-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos/mnist/t10k-images-idx3-ubyte.gz 
Begin to download
item 403/403 [============================>.] - ETA: 0s - 4ms/item
Download finished
Cache file C:\Users\YANG\.cache\paddle\dataset\mnist\t10k-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos/mnist/t10k-labels-idx1-ubyte.gz 
Begin to download

Download finished
item 2/2 [===========================>..] - ETA: 0s - 1ms/itemThe loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/3
C:\Users\YANG\.conda\envs\python38\lib\site-packages\paddle\fluid\layers\utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
  return (isinstance(seq, collections.Sequence) and
step 938/938 [==============================] - loss: 0.0559 - acc: 0.9412 - 17ms/step          
save checkpoint at C:\pycharm_project_files\pythonProject\paddle\mnist_checkpoint\0
Eval begin...
step 157/157 [==============================] - loss: 0.0034 - acc: 0.9747 - 14ms/step          
Eval samples: 10000
Epoch 2/3
step 938/938 [==============================] - loss: 0.0212 - acc: 0.9801 - 18ms/step          
save checkpoint at C:\pycharm_project_files\pythonProject\paddle\mnist_checkpoint\1
Eval begin...
step 157/157 [==============================] - loss: 2.5646e-04 - acc: 0.9851 - 14ms/step      
Eval samples: 10000
Epoch 3/3
step 938/938 [==============================] - loss: 0.0014 - acc: 0.9858 - 17ms/step          
save checkpoint at C:\pycharm_project_files\pythonProject\paddle\mnist_checkpoint\2
Eval begin...
step 157/157 [==============================] - loss: 0.0053 - acc: 0.9861 - 14ms/step          
Eval samples: 10000
save checkpoint at C:\pycharm_project_files\pythonProject\paddle\mnist_checkpoint\final

Process finished with exit code 0

数据集下载的位置:C:\Users\YANG\.cache\paddle\dataset\mnist。

如果你自己制作的数据集,建议也放在这个位置。

训练结果的保存位置:save_dir='mnist_checkpoint'

就是文件运行的路径下,它自己创建mnist_checkpoint文件夹;

简单的运行3次。

欢迎点赞,收藏,关注。

保存模型参数

目前Paddle框架有三种保存模型参数的体系,分别是:

paddle 高阶API-模型参数保存

* paddle.Model.fit
* paddle.Model.save

paddle 基础框架-动态图-模型参数保存

* paddle.save

paddle 基础框架-静态图-模型参数保存

* paddle.static.save
* paddle.static.save_inference_model

下面将基于高阶API对模型保存与加载的方法进行讲解。

方法一:¶

  • paddle.Model.fit(train_data, epochs, batch_size, save_dir, log_freq)

    在使用model.fit函数进行网络循环训练时,在save_dir参数中指定保存模型的路径,save_freq指定写入频率,即可同时实现模型的训练和保存。mode.fit()只能保存模型参数,不能保存优化器参数,每个epoch结束只会生成一个.pdparams文件。可以边训练边保存,每次epoch结束会实时生成一个.pdparams文件。

方法二:¶

  • paddle.Model.save(self, path, training=True)

    model.save(path)方法可以保存模型结构、网络参数和优化器参数,参数training=true的使用场景是在训练过程中,此时会保存网络参数和优化器参数。每个epoch生成两种文件 0.pdparams,0.pdopt,分别存储了模型参数和优化器参数,但是只会在整个模型训练完成后才会生成包含所有epoch参数的文件,path的格式为’dirname/file_prefix’ 或 ‘file_prefix’,其中dirname指定路径名称,file_prefix 指定参数文件的名称。当training=false的时候,代表已经训练结束,此时存储的是预测模型结构和网络参数

# 方法一:训练过程中实时保存每个epoch的模型参数
model.fit(train_dataset,
        test_dataset,
        epochs=2,
        batch_size=64,
        save_dir='mnist_checkpoint',
        verbose=1
        )
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/2
step 938/938 [==============================] - loss: 9.4700e-04 - acc: 0.9880 - 24ms/step    
save checkpoint at /home/aistudio/mnist_checkpoint/0
Eval begin...
step 157/157 [==============================] - loss: 5.0011e-04 - acc: 0.9817 - 24ms/step    
Eval samples: 10000
Epoch 2/2
step 938/938 [==============================] - loss: 0.0071 - acc: 0.9906 - 24ms/step         
save checkpoint at /home/aistudio/mnist_checkpoint/1
Eval begin...
step 157/157 [==============================] - loss: 1.8442e-04 - acc: 0.9867 - 22ms/step      
Eval samples: 10000
save checkpoint at /home/aistudio/mnist_checkpoint/final
# 方法二:model.save()保存模型和优化器参数信息
model.save('mnist_checkpoint/test')

加载模型参数

当恢复训练状态时,需要加载模型数据,此时可以使用加载函数从存储模型状态和优化器状态的文件中载入模型参数和优化器参数,如果不需要恢复优化器,则不必使用优化器状态文件。

高阶API-模型参数加载

* paddle.Model.load

paddle 基础框架-动态图-模型参数加载

* paddle.load

paddle 基础框架-静态图-模型参数加载

* paddle.io.load 
* paddle.io.load_inference_model

下面将对高阶API的模型参数加载方法进行讲解

  • model.load(self, path, skip_mismatch=False, reset_optimizer=False)

    model.load能够同时加载模型和优化器参数。通过reset_optimizer参数来指定是否需要恢复优化器参数,若reset_optimizer参数为True,则重新初始化优化器参数,若reset_optimizer参数为False,则从路径中恢复优化器参数。

# 高阶API加载模型
model.load('mnist_checkpoint/test')

恢复训练

理想的恢复训练是模型状态回到训练中断的时刻,恢复训练之后的梯度更新走向是和恢复训练前的梯度走向完全相同的。基于此,可以通过恢复训练后的损失变化,判断上述方法是否能准确的恢复训练。即从epoch 0结束时保存的模型参数和优化器状态恢复训练,校验其后训练的损失变化(epoch 1)是否和不中断时的训练完全一致。

说明:

恢复训练有如下两个要点:

  • 保存模型时同时保存模型参数和优化器参数

  • 恢复参数时同时恢复模型参数和优化器参数。

import paddle
from paddle.vision.datasets import MNIST
from paddle.metric import Accuracy
from paddle.static import InputSpec

train_dataset = MNIST(mode='train', transform=ToTensor())
test_dataset = MNIST(mode='test', transform=ToTensor())

inputs = InputSpec([None, 784], 'float32', 'inputs')
labels = InputSpec([None, 10], 'float32', 'labels')
model = paddle.Model(MyModel(), inputs, labels)
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
model.load("./mnist_checkpoint/final")
model.prepare( 
      optim,
      paddle.nn.loss.CrossEntropyLoss(),
      Accuracy()
      )
model.fit(train_data=train_dataset,
        eval_data=test_dataset,
        batch_size=64,
        epochs=2,
        verbose=1
        )

The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/2
step  90/938 [=>............................] - loss: 0.0624 - acc: 0.9929 - ETA: 21s - 25ms/st

总结

以上就是用Mnist手写数字识别的例子对保存模型、加载模型、恢复训练进行讲解,Paddle提供了很多保存和加载的API方法,您可以根据自己的需求进行选择。

欢迎点赞,收藏,关注。

更多推荐

paddle 练习(五)自定义网络,训练MINIST,模型参数保存