利用Keras实现手写数字的识别

1. Load and reshape mnist dataset

MNIST数据集是机器学习领域中非常经典的一个数据集,也是Keras自带数据集,该数据集由60000个训练样本和10000个测试样本组成,每个样本都是一张28 * 28像素的灰度手写数字图片。例如训练集中前20个样本图形为:

12391443-e83a76b50be02112.PNG

In [1]:
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images,train_labels),(test_images,test_labels) = mnist.load_data()

train_images = train_images.reshape((60000,28,28,1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000,28,28,1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
Using TensorFlow backend.

2. Build network by keras

In [2]:
from keras import layers
from keras import models

## Construct CNN
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu',input_shape = (28,28,1))) ## 只有第一层需要指定input_shape
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation='relu'))
In [3]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
=================================================================
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________
In [4]:
## 3D -> 1D 
## 最后要把图片预测为0~9中的数字,因此需要把卷积层拉直
model.add(layers.Flatten())
model.add(layers.Dense(64,activation='relu'))
model.add(layers.Dense(10,activation='softmax')) ## 最后要把图片预测为0~9中的数字,因此设为10个神经元,激活函数为softmax
In [5]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                36928     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
=================================================================
Total params: 93,322
Trainable params: 93,322
Non-trainable params: 0
_________________________________________________________________

Train CNN on mnist

In [6]:
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy', ## 多类分类问题,选择categorical_crossentropy为loss function
             metrics=['accuracy'])
model.fit(train_images,train_labels,epochs=5,batch_size=64)
Epoch 1/5
60000/60000 [==============================] - 77s 1ms/step - loss: 0.1632 - accuracy: 0.9497
Epoch 2/5
60000/60000 [==============================] - 84s 1ms/step - loss: 0.0444 - accuracy: 0.9864
Epoch 3/5
60000/60000 [==============================] - 94s 2ms/step - loss: 0.0304 - accuracy: 0.9905
Epoch 4/5
60000/60000 [==============================] - 106s 2ms/step - loss: 0.0228 - accuracy: 0.9929
Epoch 5/5
60000/60000 [==============================] - 106s 2ms/step - loss: 0.0185 - accuracy: 0.9948
Out[6]:
<keras.callbacks.callbacks.History at 0x102b64950>

大量的卷积操作在CPU下运算速度很慢,如用GPU,每个Epoch速度可以提升约10倍。

In [7]:
test_loss,test_acc = model.evaluate(test_images,test_labels)
10000/10000 [==============================] - 4s 353us/step
In [8]:
test_acc
Out[8]:
0.9912999868392944
In [10]:
## 看看预测结果,每张图片预测为一个10维的向量,预测标签为最大值对应的位置
model.predict(test_images)
Out[10]:
array([[2.01697825e-10, 1.94360332e-08, 1.08256515e-07, ...,
        9.99998808e-01, 2.46186076e-08, 7.19086017e-07],
       [2.91093954e-11, 4.38856862e-09, 1.00000000e+00, ...,
        9.96775159e-14, 1.64842196e-12, 8.61172574e-17],
       [8.77452666e-10, 9.99979138e-01, 1.75130353e-07, ...,
        9.21337596e-06, 3.13089743e-07, 3.56427336e-07],
       ...,
       [9.74605532e-16, 4.67950623e-10, 6.61110125e-11, ...,
        2.51455532e-08, 8.75996520e-09, 5.67301262e-10],
       [8.78741968e-10, 2.27363215e-11, 1.72569001e-12, ...,
        4.00987056e-12, 9.71275003e-05, 1.68994624e-10],
       [2.51847676e-09, 2.14226437e-09, 9.04670799e-07, ...,
        1.57024919e-13, 6.63340813e-08, 5.51239067e-13]], dtype=float32)
In [ ]: