Capsule Neural Networks

In 2017, Jeffrey Hinton (one of the founders of the back-propagation error approach) published an article that described capsular neural networks and proposed an algorithm for dynamic routing between capsules for learning the proposed architecture.

The classical convolutional neural networks have drawbacks. The internal representation of convolutional neural network data does not take into account the spatial hierarchies between simple and complex objects. So, if the image randomly depicts the eyes, nose and lips for a convolutional neural network, this is a clear sign of the presence of a face. And the rotation of an object impairs the quality of recognition, whereas the human brain easily solves this problem.

For a convolutional neural network, 2 images are similar [2]

Thousands of examples will be needed to teach object recognition from various angles of CNN.
')

Capsule networks reduce object recognition error from a different angle by 45%.

Capsule Assignment

Capsules encapsulate information about the state of a function, which is found in vector form. Capsules encode the probability of detecting an object as the length of the output vector. The state of the detected function is encoded as the direction in which the vector indicates (“instance creation parameters”). Therefore, when the detected function moves through the image or the state of the image changes, the probability remains the same (the length of the vector does not change), but the orientation changes.

Imagine that a capsule detects a face in an image and displays a 3D vector of 0.99 length. Then, move the face in the image. The vector will rotate in its space, representing a changing state, but its length will remain fixed, because the capsule is sure that it has detected the face.

Differences between capsules and neurons. [2]

An artificial neuron can be described in three steps:

1. scalar weighting of input scalars
2. sum of weighted input scalars
3. nonlinear scalar transformation.

The capsule has the vector shapes of the above 3 steps, in addition to the new phase of the affine input transformation:

1. matrix multiplication of input vectors
2. scalar weighting of input vectors
3. sum of weighted input vectors
4. vector nonlinearity.

Another innovation, which is presented in CapsNet, is a new non-linear activation function, which takes a vector and then "gives out" its length not more than 1, but does not change direction.

The right side of the equation (blue rectangle) scales the input vector so that the vector will have a block length, and the left side (red rectangle) performs additional scaling.

The design of the capsule is based on the device of an artificial neuron, but expands it to a vector shape in order to provide more powerful representative capabilities. Matrix weights are also introduced to encode the hierarchical relationships between features of different layers. The equivalence of neural activity in relation to changes in input data and invariance in the probabilities of detecting signs is achieved.

Dynamic routing between capsules

Dynamic routing algorithm [1].

The first line says that this procedure takes capsules at the lower level l and their outputs u_hat, as well as the number of iterations of routing r. The last line says that the algorithm will output a higher level of the v_j capsule.

The second line contains the new coefficient b_ij, which we have not seen before. This coefficient is a temporary value that will be updated iteratively, and after the procedure is completed, its value will be stored in c_ij. At the beginning of training, the value of b_ij is initialized to zero.

Line 3 states that steps 4-7 will be repeated r times.
The step in line 4 calculates the value of the vector c_i, which is all the routing weights for the capsule i of the lower level.

After the weights c_ij are calculated for lower level capsules, go to line 5, where we look at higher level capsules. This step calculates a linear combination of the input vectors weighted using the routing coefficients c_ij defined in the previous step.

Then, in line 6, the vectors of the last step pass through a nonlinear transformation, which ensures that the direction of the vector is preserved, but its length should not exceed 1. This step creates an output vector v_j for all higher levels of the capsule.
The basic idea is that the similarity between the input and the output is measured as the scalar product between the input and output of the capsule, and then the routing coefficient changes. Best practice is to use three iterations of routing.

Conclusion

Capsule neural networks are a promising neural network architecture that improves image recognition with changing angles and hierarchical structure. Capsule neural networks are trained using dynamic routing between capsules. Capsule nets reduce object recognition error from a different angle by 45% compared to CNN.

Shortcuts

[1] MATRIX CAPSULES WITH EM ROUTING. Geoffrey Hinton, Sara Sabour, Nicholas Frosst. 2017
[2] Understanding Hinton's Capsule Networks. Max pechyonkin

Source: https://habr.com/ru/post/417223/

All Articles

Capsule Neural Networks

Capsule Assignment

Dynamic routing between capsules

Conclusion

More articles: