from keras.datasets import mnist # subroutines for fetching the MNIST dataset from keras.models import Model # basic class for specifying and training a neural network from keras.layers import Input, Dense, Flatten, Convolution2D, MaxPooling2D, Dropout from keras.utils import np_utils # utilities for one-hot encoding of ground truth values batch_size = 128 # in each iteration, we consider 128 training examples at once num_epochs = 12 # we iterate twelve times over the entire training set kernel_size = 3 # we will use 3x3 kernels throughout pool_size = 2 # we will use 2x2 pooling throughout conv_depth = 32 # use 32 kernels in both convolutional layers drop_prob_1 = 0.25 # dropout after pooling with probability 0.25 drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5 hidden_size = 128 # there will be 128 neurons in both hidden layers num_train = 60000 # there are 60000 training examples in MNIST num_test = 10000 # there are 10000 test examples in MNIST height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale num_classes = 10 # there are 10 classes (1 per digit) (X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data X_train = X_train.reshape(X_train.shape[0], depth, height, width) X_test = X_test.reshape(X_test.shape[0], depth, height, width) X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 # Normalise data to [0, 1] range X_test /= 255 # Normalise data to [0, 1] range Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels inp = Input(shape=(depth, height, width)) # NB Keras expects channel dimension first # Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer) conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', activation='relu')(inp) conv_2 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', activation='relu')(conv_1) pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2) drop_1 = Dropout(drop_prob_1)(pool_1) flat = Flatten()(drop_1) hidden = Dense(hidden_size, activation='relu')(flat) # Hidden ReLU layer drop = Dropout(drop_prob_2)(hidden) out = Dense(num_classes, activation='softmax')(drop) # Output softmax layer model = Model(input=inp, output=out) # To define a model, just specify its input and output layers model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function optimizer='adam', # using the Adam optimiser metrics=['accuracy']) # reporting the accuracy model.fit(X_train, Y_train, # Train the model using the training set... batch_size=batch_size, nb_epoch=num_epochs, verbose=1, validation_split=0.1) # ...holding out 10% of the data for validation model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Train on 54000 samples, validate on 6000 samples Epoch 1/12 54000/54000 [==============================] - 4s - loss: 0.3010 - acc: 0.9073 - val_loss: 0.0612 - val_acc: 0.9825 Epoch 2/12 54000/54000 [==============================] - 4s - loss: 0.1010 - acc: 0.9698 - val_loss: 0.0400 - val_acc: 0.9893 Epoch 3/12 54000/54000 [==============================] - 4s - loss: 0.0753 - acc: 0.9775 - val_loss: 0.0376 - val_acc: 0.9903 Epoch 4/12 54000/54000 [==============================] - 4s - loss: 0.0629 - acc: 0.9809 - val_loss: 0.0321 - val_acc: 0.9913 Epoch 5/12 54000/54000 [==============================] - 4s - loss: 0.0520 - acc: 0.9837 - val_loss: 0.0346 - val_acc: 0.9902 Epoch 6/12 54000/54000 [==============================] - 4s - loss: 0.0466 - acc: 0.9850 - val_loss: 0.0361 - val_acc: 0.9912 Epoch 7/12 54000/54000 [==============================] - 4s - loss: 0.0405 - acc: 0.9871 - val_loss: 0.0330 - val_acc: 0.9917 Epoch 8/12 54000/54000 [==============================] - 4s - loss: 0.0386 - acc: 0.9879 - val_loss: 0.0326 - val_acc: 0.9908 Epoch 9/12 54000/54000 [==============================] - 4s - loss: 0.0349 - acc: 0.9894 - val_loss: 0.0369 - val_acc: 0.9908 Epoch 10/12 54000/54000 [==============================] - 4s - loss: 0.0315 - acc: 0.9901 - val_loss: 0.0277 - val_acc: 0.9923 Epoch 11/12 54000/54000 [==============================] - 4s - loss: 0.0287 - acc: 0.9906 - val_loss: 0.0346 - val_acc: 0.9922 Epoch 12/12 54000/54000 [==============================] - 4s - loss: 0.0273 - acc: 0.9909 - val_loss: 0.0264 - val_acc: 0.9930 9888/10000 [============================>.] - ETA: 0s [0.026324689089493085, 0.99119999999999997]
W_regularizer
parameter to each layer where we want to apply regularization. from keras.regularizers import l2 # L2-regularisation # ... l2_lambda = 0.0001 # ... # This is how to add L2-regularisation to any Keras layer with weights (eg Convolution2D/Dense) conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', W_regularizer=l2(l2_lambda), activation='relu')(inp)
init
parameter, as described below. We will use a uniform Ge initialization ( he_uniform
) for all ReLU layers and a uniform Zawier initialization ( glorot_uniform
) for the output softmax layer (since in essence it is a generalization of the logistic function to multiple similar data). # Add He initialisation to a layer conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(inp) # Add Xavier initialisation to a layer out = Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)
batch_size
examples, and during testing we normalize the statistics obtained from the entire training set, since we cannot see the test data in advance. Namely, we calculate the expectation and variance for a specific batch (package) BatchNormalization
layer is responsible for it, to which we pass several parameters, the most important of which is axis
(along which data axis statistical characteristics will calculate). In particular, while working with convolutional layers, we'd better normalize along separate channels, therefore, choose axis=1
. from keras.layers.normalization import BatchNormalization # batch normalisation # ... inp_norm = BatchNormalization(axis=1)(inp) # apply BN to the input (NB need to rename here) # conv_1 = Convolution2D(...)(inp_norm) conv_1 = BatchNormalization(axis=1)(conv_1) # apply BN to the first conv layer
ImageDataGenerator
class. We initialize the class, telling it what types of transformation we want to apply to the images, and then we run the training data through the generator, calling the fit
method, and then the flow
method, getting a continuously expanding iterator for the batch types we replenish. There is even a special method model.fit_generator
that will teach our model using this iterator, which greatly simplifies the code. There is a small flaw: this is how we lose the validation_split
parameter, which means we will have to separate the validation subset of the data ourselves, but it only takes four lines of code.ImageDataGenerator
also provides us with the ability to perform random turns, scaling, deformation, and specular reflection. All these transformations are also worth trying, except perhaps mirror images, since in real life we are unlikely to meet handwritten numbers that have been deployed in this way. from keras.preprocessing.image import ImageDataGenerator # data augmentation # ... after model.compile(...) # Explicitly split the training and validation sets X_val = X_train[54000:] Y_val = Y_train[54000:] X_train = X_train[:54000] Y_train = Y_train[:54000] datagen = ImageDataGenerator( width_shift_range=0.1, # randomly shift images horizontally (fraction of total width) height_shift_range=0.1) # randomly shift images vertically (fraction of total height) datagen.fit(X_train) # fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=num_epochs, validation_data=(X_val, Y_val), verbose=1)
merge
. from keras.layers import merge # for merging predictions in an ensemble # ... ens_models = 3 # we will train three separate models on the data # ... inp_norm = BatchNormalization(axis=1)(inp) # Apply BN to the input (NB need to rename here) outs = [] # the list of ensemble outputs for i in range(ens_models): # conv_1 = Convolution2D(...)(inp_norm) # ... outs.append(Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)) # Output softmax layer out = merge(outs, mode='ave') # average the predictions to obtain the final output
callbacks
parameter passed to the fit
or fit_generator
. As usual, everything is very compact: our program is increased only by one line of code. from keras.callbacks import EarlyStopping # ... num_epochs = 50 # we iterate at most fifty times over the entire training set # ... # fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=num_epochs, validation_data=(X_val, Y_val), verbose=1, callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping
from keras.datasets import mnist # subroutines for fetching the MNIST dataset from keras.models import Model # basic class for specifying and training a neural network from keras.layers import Input, Dense, Flatten, Convolution2D, MaxPooling2D, Dropout, merge from keras.utils import np_utils # utilities for one-hot encoding of ground truth values from keras.regularizers import l2 # L2-regularisation from keras.layers.normalization import BatchNormalization # batch normalisation from keras.preprocessing.image import ImageDataGenerator # data augmentation from keras.callbacks import EarlyStopping # early stopping batch_size = 128 # in each iteration, we consider 128 training examples at once num_epochs = 50 # we iterate at most fifty times over the entire training set kernel_size = 3 # we will use 3x3 kernels throughout pool_size = 2 # we will use 2x2 pooling throughout conv_depth = 32 # use 32 kernels in both convolutional layers drop_prob_1 = 0.25 # dropout after pooling with probability 0.25 drop_prob_2 = 0.5 # dropout in the FC layer with probability 0.5 hidden_size = 128 # there will be 128 neurons in both hidden layers l2_lambda = 0.0001 # use 0.0001 as a L2-regularisation factor ens_models = 3 # we will train three separate models on the data num_train = 60000 # there are 60000 training examples in MNIST num_test = 10000 # there are 10000 test examples in MNIST height, width, depth = 28, 28, 1 # MNIST images are 28x28 and greyscale num_classes = 10 # there are 10 classes (1 per digit) (X_train, y_train), (X_test, y_test) = mnist.load_data() # fetch MNIST data X_train = X_train.reshape(X_train.shape[0], depth, height, width) X_test = X_test.reshape(X_test.shape[0], depth, height, width) X_train = X_train.astype('float32') X_test = X_test.astype('float32') Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels # Explicitly split the training and validation sets X_val = X_train[54000:] Y_val = Y_train[54000:] X_train = X_train[:54000] Y_train = Y_train[:54000] inp = Input(shape=(depth, height, width)) # NB Keras expects channel dimension first inp_norm = BatchNormalization(axis=1)(inp) # Apply BN to the input (NB need to rename here) outs = [] # the list of ensemble outputs for i in range(ens_models): # Conv [32] -> Conv [32] -> Pool (with dropout on the pooling layer), applying BN in between conv_1 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(inp_norm) conv_1 = BatchNormalization(axis=1)(conv_1) conv_2 = Convolution2D(conv_depth, kernel_size, kernel_size, border_mode='same', init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(conv_1) conv_2 = BatchNormalization(axis=1)(conv_2) pool_1 = MaxPooling2D(pool_size=(pool_size, pool_size))(conv_2) drop_1 = Dropout(drop_prob_1)(pool_1) flat = Flatten()(drop_1) hidden = Dense(hidden_size, init='he_uniform', W_regularizer=l2(l2_lambda), activation='relu')(flat) # Hidden ReLU layer hidden = BatchNormalization(axis=1)(hidden) drop = Dropout(drop_prob_2)(hidden) outs.append(Dense(num_classes, init='glorot_uniform', W_regularizer=l2(l2_lambda), activation='softmax')(drop)) # Output softmax layer out = merge(outs, mode='ave') # average the predictions to obtain the final output model = Model(input=inp, output=out) # To define a model, just specify its input and output layers model.compile(loss='categorical_crossentropy', # using the cross-entropy loss function optimizer='adam', # using the Adam optimiser metrics=['accuracy']) # reporting the accuracy datagen = ImageDataGenerator( width_shift_range=0.1, # randomly shift images horizontally (fraction of total width) height_shift_range=0.1) # randomly shift images vertically (fraction of total height) datagen.fit(X_train) # fit the model on the batches generated by datagen.flow()---most parameters similar to model.fit model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size), samples_per_epoch=X_train.shape[0], nb_epoch=num_epochs, validation_data=(X_val, Y_val), verbose=1, callbacks=[EarlyStopping(monitor='val_loss', patience=5)]) # adding early stopping model.evaluate(X_test, Y_test, verbose=1) # Evaluate the trained model on the test set!
Epoch 1/50 54000/54000 [==============================] - 30s - loss: 0.3487 - acc: 0.9031 - val_loss: 0.0579 - val_acc: 0.9863 Epoch 2/50 54000/54000 [==============================] - 30s - loss: 0.1441 - acc: 0.9634 - val_loss: 0.0424 - val_acc: 0.9890 Epoch 3/50 54000/54000 [==============================] - 30s - loss: 0.1126 - acc: 0.9716 - val_loss: 0.0405 - val_acc: 0.9887 Epoch 4/50 54000/54000 [==============================] - 30s - loss: 0.0929 - acc: 0.9757 - val_loss: 0.0390 - val_acc: 0.9890 Epoch 5/50 54000/54000 [==============================] - 30s - loss: 0.0829 - acc: 0.9788 - val_loss: 0.0329 - val_acc: 0.9920 Epoch 6/50 54000/54000 [==============================] - 30s - loss: 0.0760 - acc: 0.9807 - val_loss: 0.0315 - val_acc: 0.9917 Epoch 7/50 54000/54000 [==============================] - 30s - loss: 0.0740 - acc: 0.9824 - val_loss: 0.0310 - val_acc: 0.9917 Epoch 8/50 54000/54000 [==============================] - 30s - loss: 0.0679 - acc: 0.9826 - val_loss: 0.0297 - val_acc: 0.9927 Epoch 9/50 54000/54000 [==============================] - 30s - loss: 0.0663 - acc: 0.9834 - val_loss: 0.0300 - val_acc: 0.9908 Epoch 10/50 54000/54000 [==============================] - 30s - loss: 0.0658 - acc: 0.9833 - val_loss: 0.0281 - val_acc: 0.9923 Epoch 11/50 54000/54000 [==============================] - 30s - loss: 0.0600 - acc: 0.9844 - val_loss: 0.0272 - val_acc: 0.9930 Epoch 12/50 54000/54000 [==============================] - 30s - loss: 0.0563 - acc: 0.9857 - val_loss: 0.0250 - val_acc: 0.9923 Epoch 13/50 54000/54000 [==============================] - 30s - loss: 0.0530 - acc: 0.9862 - val_loss: 0.0266 - val_acc: 0.9925 Epoch 14/50 54000/54000 [==============================] - 31s - loss: 0.0517 - acc: 0.9865 - val_loss: 0.0263 - val_acc: 0.9923 Epoch 15/50 54000/54000 [==============================] - 30s - loss: 0.0510 - acc: 0.9867 - val_loss: 0.0261 - val_acc: 0.9940 Epoch 16/50 54000/54000 [==============================] - 30s - loss: 0.0501 - acc: 0.9871 - val_loss: 0.0238 - val_acc: 0.9937 Epoch 17/50 54000/54000 [==============================] - 30s - loss: 0.0495 - acc: 0.9870 - val_loss: 0.0246 - val_acc: 0.9923 Epoch 18/50 54000/54000 [==============================] - 31s - loss: 0.0463 - acc: 0.9877 - val_loss: 0.0271 - val_acc: 0.9933 Epoch 19/50 54000/54000 [==============================] - 30s - loss: 0.0472 - acc: 0.9877 - val_loss: 0.0239 - val_acc: 0.9935 Epoch 20/50 54000/54000 [==============================] - 30s - loss: 0.0446 - acc: 0.9885 - val_loss: 0.0226 - val_acc: 0.9942 Epoch 21/50 54000/54000 [==============================] - 30s - loss: 0.0435 - acc: 0.9890 - val_loss: 0.0218 - val_acc: 0.9947 Epoch 22/50 54000/54000 [==============================] - 30s - loss: 0.0432 - acc: 0.9889 - val_loss: 0.0244 - val_acc: 0.9928 Epoch 23/50 54000/54000 [==============================] - 30s - loss: 0.0419 - acc: 0.9893 - val_loss: 0.0245 - val_acc: 0.9943 Epoch 24/50 54000/54000 [==============================] - 30s - loss: 0.0423 - acc: 0.9890 - val_loss: 0.0231 - val_acc: 0.9933 Epoch 25/50 54000/54000 [==============================] - 30s - loss: 0.0400 - acc: 0.9894 - val_loss: 0.0213 - val_acc: 0.9938 Epoch 26/50 54000/54000 [==============================] - 30s - loss: 0.0384 - acc: 0.9899 - val_loss: 0.0226 - val_acc: 0.9943 Epoch 27/50 54000/54000 [==============================] - 30s - loss: 0.0398 - acc: 0.9899 - val_loss: 0.0217 - val_acc: 0.9945 Epoch 28/50 54000/54000 [==============================] - 30s - loss: 0.0383 - acc: 0.9902 - val_loss: 0.0223 - val_acc: 0.9940 Epoch 29/50 54000/54000 [==============================] - 31s - loss: 0.0382 - acc: 0.9898 - val_loss: 0.0229 - val_acc: 0.9942 Epoch 30/50 54000/54000 [==============================] - 31s - loss: 0.0379 - acc: 0.9900 - val_loss: 0.0225 - val_acc: 0.9950 Epoch 31/50 54000/54000 [==============================] - 30s - loss: 0.0359 - acc: 0.9906 - val_loss: 0.0228 - val_acc: 0.9943 10000/10000 [==============================] - 2s [0.017431972888592554, 0.99470000000000003]
Oh, and come to work with us? :)wunderfund.io is a young foundation that deals with high-frequency algorithmic trading . High-frequency trading is a continuous competition of the best programmers and mathematicians of the whole world. By joining us, you will become part of this fascinating fight.
We offer interesting and challenging data analysis and low latency tasks for enthusiastic researchers and programmers. Flexible schedule and no bureaucracy, decisions are quickly made and implemented.
Join our team: wunderfund.io
Source: https://habr.com/ru/post/315476/
All Articles