基于nolear建立的ConvNet體系結(jié)構(gòu)并用它去訓(xùn)練一個(gè)特征提取器 - 全文

摘要：本文展示了如何基于nolearn使用一些卷積層和池化層來(lái)建立一個(gè)簡(jiǎn)單的ConvNet體系結(jié)構(gòu)，以及如何使用ConvNet去訓(xùn)練一個(gè)特征提取器，然后在使用如SVM、Logistic回歸等不同的模型之前使用它來(lái)進(jìn)行特征提取。
?

卷積神經(jīng)網(wǎng)絡(luò)（ConvNets）是受生物啟發(fā)的MLPs（多層感知器），它們有著不同類別的層，并且每層的工作方式與普通的MLP層也有所差異。如果你對(duì)ConvNets感興趣，這里有個(gè)很好的教程CS231n?–?Convolutional?Neural?Newtorks?for?Visual?Recognition。CNNs的體系結(jié)構(gòu)如下所示：

常規(guī)的神經(jīng)網(wǎng)絡(luò)

ConvNet網(wǎng)絡(luò)體系結(jié)構(gòu)

如你所見(jiàn)，ConvNets工作時(shí)伴隨著3D卷積并且在不斷轉(zhuǎn)變著這些3D卷積。我在這篇文章中不會(huì)再重復(fù)整個(gè)CS231n的教程，所以如果你真的感興趣，請(qǐng)?jiān)诶^續(xù)閱讀之前先花點(diǎn)時(shí)間去學(xué)習(xí)一下。

Lasagne 和 nolearn

Lasagne和nolearn是我最喜歡使用的深度學(xué)習(xí)Python包。Lasagne是基于Theano的，所以GPU的加速將大有不同，并且其對(duì)神經(jīng)網(wǎng)絡(luò)創(chuàng)建的聲明方法也很有幫助。nolearn庫(kù)是一個(gè)神經(jīng)網(wǎng)絡(luò)軟件包實(shí)用程序集（包含Lasagne），它在神經(jīng)網(wǎng)絡(luò)體系結(jié)構(gòu)的創(chuàng)建過(guò)程上、各層的檢驗(yàn)等都能夠給我們很大的幫助。

在這篇文章中我要展示的是，如何使用一些卷積層和池化層來(lái)建立一個(gè)簡(jiǎn)單的ConvNet體系結(jié)構(gòu)。我還將向你展示如何使用ConvNet去訓(xùn)練一個(gè)特征提取器，在使用如SVM、Logistic回歸等不同的模型之前使用它來(lái)進(jìn)行特征提取。大多數(shù)人使用的是預(yù)訓(xùn)練ConvNet模型，然后刪除最后一個(gè)輸出層，接著從ImageNets數(shù)據(jù)集上訓(xùn)練的ConvNets網(wǎng)絡(luò)提取特征。這通常被稱為是遷移學(xué)習(xí)，因?yàn)閷?duì)于不同的問(wèn)題你可以使用來(lái)自其它的ConvNets層，由于ConvNets的第一層過(guò)濾器被當(dāng)做是一個(gè)邊緣探測(cè)器，所以它們可以用來(lái)作為其它問(wèn)題的普通特征探測(cè)器。

加載MNIST數(shù)據(jù)集

MNIST數(shù)據(jù)集是用于數(shù)字識(shí)別最傳統(tǒng)的數(shù)據(jù)集之一。我們使用的是一個(gè)面向Python的版本，但先讓我們導(dǎo)入需要使用的包：
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from urllib import urlretrieve
import cPickle as pickle
import os
import gzip
import numpy as np
import theano
import lasagne
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from nolearn.lasagne import visualize
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

正如你所看到的，我們導(dǎo)入了用于繪圖的matplotlib包，一些用于下載MNIST數(shù)據(jù)集的原生Python模塊，numpy， theano，lasagne，nolearn 以及 scikit-learn庫(kù)中用于模型評(píng)估的一些函數(shù)。

然后，我們定義一個(gè)加載MNIST數(shù)據(jù)集的函數(shù)（這個(gè)功能與Lasagne教程上使用的非常相似）
def load_dataset():
url = 'http://deeplearning.net/data/mnist/mnist.pkl.gz'
filename = 'mnist.pkl.gz'
if not os.path.exists(filename):
print("Downloading MNIST dataset...")
urlretrieve(url, filename)
with gzip.open(filename, 'rb') as f:
data = pickle.load(f)
X_train, y_train = data[0]
X_val, y_val = data[1]
X_test, y_test = data[2]
X_train = X_train.reshape((-1, 1, 28, 28))
X_val = X_val.reshape((-1, 1, 28, 28))
X_test = X_test.reshape((-1, 1, 28, 28))
y_train = y_train.astype(np.uint8)
y_val = y_val.astype(np.uint8)
y_test = y_test.astype(np.uint8)
return X_train, y_train, X_val, y_val, X_test, y_test

正如你看到的，我們正在下載處理過(guò)的MNIST數(shù)據(jù)集，接著把它拆分為三個(gè)不同的數(shù)據(jù)集，分別是：訓(xùn)練集、驗(yàn)證集和測(cè)試集。然后重置圖像內(nèi)容，為之后的Lasagne輸入層做準(zhǔn)備，與此同時(shí)，由于GPU/theano數(shù)據(jù)類型的限制，我們還把numpy的數(shù)據(jù)類型轉(zhuǎn)換成了uint8。

隨后，我們準(zhǔn)備加載MNIST數(shù)據(jù)集并檢驗(yàn)它：
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
plt.imshow(X_train[0][0], cmap=cm.binary)

這個(gè)代碼將輸出下面的圖像（我用的是IPython Notebook）

一個(gè)MNIST數(shù)據(jù)集的數(shù)字實(shí)例（該實(shí)例是5）

ConvNet體系結(jié)構(gòu)與訓(xùn)練

現(xiàn)在，定義我們的ConvNet體系結(jié)構(gòu)，然后使用單GPU/CPU來(lái)訓(xùn)練它（我有一個(gè)非常廉價(jià)的GPU，但它很有用）
net1 = NeuralNet(
layers=[('input', layers.InputLayer),
('conv2d1', layers.Conv2DLayer),
('maxpool1', layers.MaxPool2DLayer),
('conv2d2', layers.Conv2DLayer),
('maxpool2', layers.MaxPool2DLayer),
('dropout1', layers.DropoutLayer),
('dense', layers.DenseLayer),
('dropout2', layers.DropoutLayer),
('output', layers.DenseLayer),
],
# input layer
input_shape=(None, 1, 28, 28),
# layer conv2d1
conv2d1_num_filters=32,
conv2d1_filter_size=(5, 5),
conv2d1_nonlinearity=lasagne.nonlinearities.rectify,
conv2d1_W=lasagne.init.GlorotUniform(),
# layer maxpool1
maxpool1_pool_size=(2, 2),
# layer conv2d2
conv2d2_num_filters=32,
conv2d2_filter_size=(5, 5),
conv2d2_nonlinearity=lasagne.nonlinearities.rectify,
# layer maxpool2
maxpool2_pool_size=(2, 2),
# dropout1
dropout1_p=0.5,
# dense
dense_num_units=256,
dense_nonlinearity=lasagne.nonlinearities.rectify,
# dropout2
dropout2_p=0.5,
# output
output_nonlinearity=lasagne.nonlinearities.softmax,
output_num_units=10,
# optimization method params
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,
max_epochs=10,
verbose=1,
)
# Train the network
nn = net1.fit(X_train, y_train)

如你所視，在layers的參數(shù)中，我們定義了一個(gè)有層名稱/類型的元組字典，然后定義了這些層的參數(shù)。在這里，我們的體系結(jié)構(gòu)使用的是兩個(gè)卷積層，兩個(gè)池化層，一個(gè)全連接層（稠密層，dense layer）和一個(gè)輸出層。在一些層之間也會(huì)有dropout層，dropout層是一個(gè)正則化矩陣，隨機(jī)的設(shè)置輸入值為零來(lái)避免過(guò)擬合（見(jiàn)下圖）。

Dropout層效果

調(diào)用訓(xùn)練方法后，nolearn包將會(huì)顯示學(xué)習(xí)過(guò)程的狀態(tài)，我的機(jī)器使用的是低端的的GPU，得到的結(jié)果如下：
# Neural Network with 160362 learnable parameters

## Layer information

# name size
--- -------- --------
0 input 1x28x28
1 conv2d1 32x24x24
2 maxpool1 32x12x12
3 conv2d2 32x8x8
4 maxpool2 32x4x4
5 dropout1 32x4x4
6 dense 256
7 dropout2 256
8 output 10

epoch train loss valid loss train/val valid acc dur
------- ------------ ------------ ----------- --------- ---
1 0.85204 0.16707 5.09977 0.95174 33.71s
2 0.27571 0.10732 2.56896 0.96825 33.34s
3 0.20262 0.08567 2.36524 0.97488 33.51s
4 0.16551 0.07695 2.15081 0.97705 33.50s
5 0.14173 0.06803 2.08322 0.98061 34.38s
6 0.12519 0.06067 2.06352 0.98239 34.02s
7 0.11077 0.05532 2.00254 0.98427 33.78s
8 0.10497 0.05771 1.81898 0.98248 34.17s
9 0.09881 0.05159 1.91509 0.98407 33.80s
10 0.09264 0.04958 1.86864 0.98526 33.40s

正如你看到的，最后一次的精度可以達(dá)到0.98526，是這10個(gè)單元訓(xùn)練中的一個(gè)相當(dāng)不錯(cuò)的性能。

預(yù)測(cè)和混淆矩陣

現(xiàn)在，我們使用這個(gè)模型來(lái)預(yù)測(cè)整個(gè)測(cè)試集：
preds = net1.predict(X_test)

我們還可以繪制一個(gè)混淆矩陣來(lái)檢查神經(jīng)網(wǎng)絡(luò)的分類性能：
cm = confusion_matrix(y_test, preds)
plt.matshow(cm)
plt.title('Confusion matrix')
plt.colorbar()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

上面的代碼將繪制下面的混淆矩陣：

混淆矩陣

如你所視，對(duì)角線上的分類更密集，表明我們的分類器有一個(gè)良好的性能。

過(guò)濾器的可視化

我們還可以從第一個(gè)卷積層中可視化32個(gè)過(guò)濾器：
visualize.plot_conv_weights(net1.layers_['conv2d1'])

上面的代碼將繪制下面的過(guò)濾器：

第一層的5x5x32過(guò)濾器

如你所視，nolearn的plot_conv_weights函數(shù)在我們指定的層中繪制出了所有的過(guò)濾器。

Theano層的功能和特征提取

現(xiàn)在可以創(chuàng)建theano編譯的函數(shù)了，它將前饋輸入數(shù)據(jù)輸送到結(jié)構(gòu)體系中，甚至是你感興趣的某一層中。接著，我會(huì)得到輸出層的函數(shù)和輸出層前面的稠密層函數(shù)。
dense_layer = layers.get_output(net1.layers_['dense'], deterministic=True)
output_layer = layers.get_output(net1.layers_['output'], deterministic=True)
input_var = net1.layers_['input'].input_var
f_output = theano.function([input_var], output_layer)
f_dense = theano.function([input_var], dense_layer)

如你所視，我們現(xiàn)在有兩個(gè)theano函數(shù)，分別是f_output和f_dense（用于輸出層和稠密層）。請(qǐng)注意，在這里為了得到這些層，我們使用了一個(gè)額外的叫做“deterministic”的參數(shù)，這是為了避免dropout層影響我們的前饋操作。

現(xiàn)在，我們可以把實(shí)例轉(zhuǎn)換為輸入格式，然后輸入到theano函數(shù)輸出層中：
instance = X_test[0][None, :, :]
%timeit -n 500 f_output(instance)
500 loops, best of 3: 858 μs per loop

如你所視，f_output函數(shù)平均需要858μs。我們同樣可以為這個(gè)實(shí)例繪制輸出層激活值結(jié)果：
pred = f_output(instance)
N = pred.shape[1]
plt.bar(range(N), pred.ravel())

上面的代碼將繪制出下面的圖：

輸出層激活值

?

正如你所看到的，數(shù)字被認(rèn)為是7。事實(shí)是為任何網(wǎng)絡(luò)層創(chuàng)建theano函數(shù)都是非常有用的，因?yàn)槟憧梢詣?chuàng)建一個(gè)函數(shù)（像我們以前一樣）得到稠密層（輸出層前一個(gè)）的激活值，然后你可以使用這些激活值作為特征，并且使用你的神經(jīng)網(wǎng)絡(luò)作為特征提取器而不是分類器。現(xiàn)在，讓我們?yōu)槌砻軐永L制256個(gè)激活單元：
pred = f_dense(instance)
N = pred.shape[1]
plt.bar(range(N), pred.ravel())

上面的代碼將繪制下面的圖：

基于nolear建立的ConvNet體系結(jié)構(gòu)并用它去訓(xùn)練一個(gè)特征提取器