经典 CNN 架构剖析:LeNet、AlexNet、ResNet 的里程碑演进

📂 所属阶段:第二阶段 — 深度学习视觉基础(CNN 篇)
🔗 相关章节:卷积核、步长与池化 · 手写数字识别 (MNIST) 实战


1. LeNet(1998)

"""
LeNet:最早的 CNN,用于手写数字识别

架构:
  Input(32×32) → Conv(6) → Pool → Conv(16) → Pool → FC(120) → FC(84) → Output(10)

特点:
  - 简单有效
  - 参数少
  - 为现代 CNN 奠定基础
"""

import torch
import torch.nn as nn

class LeNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

2. AlexNet(2012)

"""
AlexNet:深度学习复兴的标志

创新:
  - 8 层深度网络
  - ReLU 激活函数
  - Dropout 正则化
  - GPU 加速
  - 数据增强

效果:ImageNet 2012 冠军,准确率 85.2%
"""

import torch
import torch.nn as nn

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

3. ResNet(2015)

"""
ResNet:残差网络,解决深度网络的梯度消失问题

创新:
  - 残差连接(Skip Connection)
  - 可以训练 152 层深度网络
  - 准确率 96.4%(超越人类)

残差块:
  y = F(x) + x
  而不是 y = F(x)
"""

import torch
import torch.nn as nn

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # 残差连接
        out = torch.relu(out)
        return out

4. 架构对比

架构年份深度参数量ImageNet 准确率创新
LeNet1998560K-CNN 基础
AlexNet2012860M85.2%ReLU、Dropout
VGG201416-19138M92.7%小卷积核堆叠
ResNet201550-15225M96.4%残差连接
DenseNet2016121-16928M96.5%密集连接

5. 小结

CNN 架构演进规律:

1. LeNet:证明 CNN 可行
2. AlexNet:深度 + GPU = 深度学习复兴
3. VGG:小卷积核堆叠 > 大卷积核
4. ResNet:残差连接解决梯度消失

2026 年实践:
- 分类任务:用预训练 ResNet
- 实时应用:用 MobileNet/ShuffleNet
- 最新研究:用 Vision Transformer

💡 记住:残差连接是现代深度网络的基础。没有它,我们无法训练超过 50 层的网络。


🔗 扩展阅读