Introduction to model pruning

Introduction to Model Pruning

2024-07-12

Ref：https://www.cnblogs.com/the-art-of-ai/p/17500399.html

1. Background

Deep learning models have achieved remarkable results in image recognition, natural language processing, speech recognition and other fields, but these models often require a lot of computing resources and storage space. Especially in resource-constrained environments such as mobile devices and embedded systems, the size and computational complexity of these models often become bottlenecks that limit their application. Therefore, how to reduce the size and computational complexity of the model as much as possible while maintaining the accuracy of the model has become an important research direction.

Model pruning technology is an effective way to solve this problem.It optimizes the structure and reduces the parameters of the deep learning model, making the model smaller and faster while maintaining accuracy, so that it can better adapt to different tasks and environments.。

2. Basic principles

Model pruning technology refers to a technique for optimizing the structure and reducing parameters of deep learning models.Pruning techniques can be divided intoStructural pruningandParameter pruningTwo forms.

Structural pruning refers to removing someUnnecessary structural units, such as neurons, convolution kernels, layers, etc., to reduce the computational complexity and storage space of the model. Common structural pruning methods include: channel pruning, layer pruning, node pruning, filter pruning, etc.

Parameter pruning refers to extractingDelete some unnecessary weight parameters, in order to reduce the storage space and computational complexity of the model while maintaining the accuracy of the model. Common parameter pruning methods include: L1 regularization, L2 regularization, sorting pruning, local sensitive hashing pruning, etc.

3. Technical Principle

The core idea of model pruning technology is to reduce the storage space and computational complexity of the model as much as possible while maintaining the accuracy of the model.Since structural units and parameters such as neurons, convolution kernels, and weight parameters in deep learning models often have redundant and unnecessary parts, pruning technology can be used to reduce these redundant parts, thereby reducing the model size and computational complexity.

Specifically, the implementation of model pruning technology can be divided into the following steps:

(1) Initialize the model: First, initialize a deep learning model and train it to obtain a baseline model;

(2) Select pruning quantization methods and strategies. Select appropriate pruning methods and strategies based on specific application scenarios and requirements. Common simple methods include:Structural pruning and parameter pruning; Common strategies include: global pruning and iterative pruning;

(3) Pruning the model: Pruning the deep learning model based on the selected pruning method and strategy; specifically, deleting some unnecessary structural units and weight parameters, or setting them to 0 or very small values;

(4) Retrain the model; pruning may cause the model accuracy to decrease; therefore, the pruned model needs to be retrained to restore the model accuracy;

(5) Fine-tune the model: After retraining, fine-tune the model to further improve the accuracy of the model;

Code:


import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
 
# 定义一个简单的卷积神经网络
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 4, kernel_size=3, padding=1)  # 4个输出通道
        self.conv2 = nn.Conv2d(4, 8, kernel_size=3, padding=1)  # 8个输出通道
        self.fc1 = nn.Linear(8 * 7 * 7, 64)
        self.fc2 = nn.Linear(64, 10)
 
    def forward(self, x):
        x = F.relu(self.conv1(x))  # 卷积层1 + ReLU激活函数
        x = F.max_pool2d(x, 2)  # 最大池化层，池化核大小为2x2
        x = F.relu(self.conv2(x))  # 卷积层2 + ReLU激活函数
        x = F.max_pool2d(x, 2)  # 最大池化层，池化核大小为2x2
        x = x.view(x.size(0), -1)  # 展平操作，将多维张量展平成一维
        x = F.relu(self.fc1(x))  # 全连接层1 + ReLU激活函数
        x = self.fc2(x)  # 全连接层2，输出10个类别
        return x
 
# 实例化模型
model = SimpleCNN()
 
# 打印剪枝前的模型结构
print("Model before pruning:")
print(model)
 
# 加载数据
transform = transforms.Compose([
    transforms.ToTensor(),  # 转换为张量
    transforms.Normalize((0.1307,), (0.3081,))  # 归一化
])
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)  # 加载训练数据集
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)  # 创建数据加载器
 
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()  # 交叉熵损失函数
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam优化器
 
# 训练模型
model.train()  # 将模型设置为训练模式
for epoch in range(1):  # 训练一个epoch
    running_loss = 0.0
    for data, target in train_loader:
        optimizer.zero_grad()  # 清零梯度
        outputs = model(data)  # 前向传播
        loss = criterion(outputs, target)  # 计算损失
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        running_loss += loss.item() * data.size(0)  # 累加损失
 
    epoch_loss = running_loss / len(train_loader.dataset)  # 计算平均损失
    print(f'Epoch {epoch + 1}, Loss: {epoch_loss:.4f}')
 
# 通道剪枝
# 获取卷积层的权重
conv1_weights = model.conv1.weight.data.abs().sum(dim=[1, 2, 3])  # 计算每个通道的L1范数
 
# 按照L1范数对通道进行排序
sorted_channels = torch.argsort(conv1_weights)
 
# 选择需要删除的通道
num_prune = 2  # 假设我们要删除2个通道
channels_to_prune = sorted_channels[:num_prune]
 
print("Channels to prune:", channels_to_prune)
 
# 删除指定通道的权重和偏置
pruned_weights = torch.index_select(model.conv1.weight.data, 0, sorted_channels[num_prune:])  # 获取保留的权重
pruned_bias = torch.index_select(model.conv1.bias.data, 0, sorted_channels[num_prune:])  # 获取保留的偏置
 
# 创建一个新的卷积层，并将剪枝后的权重和偏置赋值给它
model.conv1 = nn.Conv2d(in_channels=1, out_channels=4 - num_prune, kernel_size=3, padding=1)
model.conv1.weight.data = pruned_weights
model.conv1.bias.data = pruned_bias
 
# 同时我们还需要调整conv2层的输入通道
# 获取conv2层的权重并调整其输入通道
conv2_weights = model.conv2.weight.data[:, sorted_channels[num_prune:], :, :]  # 调整输入通道的权重
 
# 创建一个新的卷积层，并将剪枝后的权重赋值给它
model.conv2 = nn.Conv2d(in_channels=4 - num_prune, out_channels=8, kernel_size=3, padding=1)
model.conv2.weight.data = conv2_weights
 
# 打印剪枝后的模型结构
print("Model after pruning:")
print(model)
 
# 定义新的优化器
optimizer = optim.Adam(model.parameters(), lr=0.001)
 
# 重新训练模型
model.train()  # 将模型设置为训练模式
for epoch in range(1):  # 训练一个epoch
    running_loss = 0.0
    for data, target in train_loader:
        optimizer.zero_grad()  # 清零梯度
        outputs = model(data)  # 前向传播
        loss = criterion(outputs, target)  # 计算损失
        loss.backward()  # 反向传播
        optimizer.step()  # 更新参数
        running_loss += loss.item() * data.size(0)  # 累加损失
 
    epoch_loss = running_loss / len(train_loader.dataset)  # 计算平均损失
    print(f'Epoch {epoch + 1}, Loss: {epoch_loss:.4f}')
 
# 加载测试数据
test_dataset = datasets.MNIST('./data', train=False, transform=transform)  # 加载测试数据集
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1000, shuffle=False)  # 创建数据加载器
 
# 评估模型
model.eval()  # 将模型设置为评估模式
correct = 0
total = 0
with torch.no_grad():  # 关闭梯度计算
    for data, target in test_loader:
        outputs = model(data)  # 前向传播
        _, predicted = torch.max(outputs.data, 1)  # 获取预测结果
        total += target.size(0)  # 总样本数
        correct += (predicted == target).sum().item()  # 正确预测的样本数
 
print(f'Accuracy: {100 * correct / total}%')  # 打印准确率

In order to improve the performance and efficiency of pruning technology, the following optimizations can be considered:

Select appropriate pruning strategies and pruning algorithms to improve the effect and accuracy of pruning.
Fine-tune or perform incremental learning on the pruned model to further improve the accuracy and performance of the model.
Use parallel computing and distributed computing techniques to speed up the pruning and training process.

Technology Sharing

Introduction to Model Pruning

Personal profile

my contact information