論文原文《Very Deep Convolutional Networks for Large-Scale Image Recognition》。
from mxnet import gluon, init, nd
from mxnet.gluon import nn
def vgg_block(num_convs, num_channels):
blk = nn.Sequential()
for _ in range(num_convs):
blk.add(nn.Conv2D(num_channels, kernel_size=3, padding=1, activation='relu'))
blk.add(nn.MaxPool2D(pool_size=2, strides=2))
return blk
這裡實作的是VGG-11,它有5個卷積塊,前2塊使用單卷積層,而後3塊使用雙卷積層。第一塊的輸出通道是64,之後每次對輸出通道數翻倍,直到變為512。
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
def vgg(conv_arch):
net = nn.Sequential()
for (num_convs, num_channels) in conv_arch:
net.add(vgg_block(num_convs, num_channels))
net.add(nn.Dense(4096, activation='relu'), nn.Dropout(0.5),
nn.Dense(4096, activation='relu'), nn.Dropout(0.5),
nn.Dense(10))
return net
net = vgg(conv_arch)
列印各層的尺寸看一下。
net.initialize()
X = nd.random.uniform(shape=(1, 1, 224, 224))
for blk in net:
X = blk(X)
print(blk.name, 'output shape:\t', X.shape)
為簡單起見資料集還是使用Fashion-MNIST。
lr, num_epochs, batch_size, ctx = 0.05, 5, 128, try_gpu()
net.initialize(force_reinit=True, ctx=ctx, init=init.Xavier())
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr})
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
train(net, train_iter, test_iter, batch_size, trainer, ctx, num_epochs)
其中try_gpu()函數和train()函數的實作請參考連結:【MXNet】(二十):實作AlexNet。
VGG網絡比較深,參數量很大,是以需要的GPU顯存也是比較多的。我的GPU是NVIDIA Geforce GTX 1050Ti,4G顯存,把batch_size改為16才能訓練。
慢的感人!!!