Robust imaging by random transform-invariant convolutional neural networks

Today’s paper review gets back to convolutional neural networks (CNNs) with a nice paper from chinese researchers, featured at this year Association for Computing Machinery  (ACM) Conference in The Netherlands. It proposes a new enhancement for CNNs based on a random invariant transformation of CNNs, specifically designed for spatial transformations of images.

CNNs still display poor ability in invariant spatial transformation of images, but the researchers in this paper claim that by randomly transforming (rotation, scale and translation) feature maps of CCNs during  the training stage could prevent complex dependencies in these transformations, thereby skipping the need to perform any extra training supervision or modification of the optimization process and training images:

Transform-Invariant Convolutional Neural Networks for
Image Classification and Search

Abstract:

Convolutional neural networks (CNNs) have achieved state-
of-the-art results on many visual recognition tasks. How-
ever, current CNN models still exhibit a poor ability to
be invariant to spatial transformations of images. Intu-
itively, with sucient layers and parameters, hierarchical
combinations of convolution (matrix multiplication and non-
linear activation) and pooling operations should be able to
learn a robust mapping from transformed input images to
transform-invariant representations. In this paper, we pro-
pose randomly transforming (rotation, scale, and transla-
tion) feature maps of CNNs during the training stage. This
prevents complex dependencies of specic rotation, scale,
and translation levels of training images in CNN models.
Rather, each convolutional kernel learns to detect a fea-
ture that is generally helpful for producing the transform-
invariant answer given the combinatorially large variety of
transform levels of its input feature maps. In this way, we
do not require any extra training supervision or modica-
tion to the optimization process and training images. We
show that random transformation provides signicant im-
provements of CNNs on many benchmark tasks, including
small-scale image recognition, large-scale image recognition,
and image retrieval.

Problem formulation

The introductory paragraphs of the paper offer a nice clear picture of the problem at hand that the researchers propose to address. Current CNNs are not able generalize their ability, via a combination of local receptive fields, pooling and shared weighs, to locally detect invariant salient features to a  large global transform-invariant representation of an image representation. Only a very deep hierarchy of pooling and convolutions display global invariance, and intermediate feature maps in CNNs are not invariant to large transformations of the input data:

The local transform-invariant property of CNNs lies in the

combination of local receptive fields with shared weights and
pooling. Because distortions or shifts of the input can cause
the positions of salient features to vary, local receptive elds
with shared weights are able to detect invariant elementary
features despite changes in the positions of salient features
[24]. Moreover, average-pooling or max-pooling reduces the
resolution of the feature maps in each layer, which reduces
the sensitivity of the output to small local shifts and dis-
tortions. However, due to the typically small local spatial
support for pooling (e.g., 2×2 pixels) and convolution (e.g.,
9×9 kernel size), large global invariance is only possible for a
very deep hierarchy of pooling and convolutions, and the in-
termediate feature maps in CNNs are not invariant to large
transformations of the input data [25]. This limitation of C-
NNs results from the poor capacity of pooling and the con-
volution mechanism in learning global transform-invariant
representations (Fig. 1).

The proposed model is a random transformation module that can be included in a layer in a CNN. This method is inspired by the success of the dropout method (neural networks), where each hidden unit layer is randomly omitted with a certain threshold probability  on each presentation of each training case. This way it is claimed that each neuron learns to detect independent features with little dependency on the variety of internal contexts in the same layer:

Therefore, a hidden
unit cannot rely on other hidden units being present. In
this way, each neuron learns to detect a independent feature
that is generally robust for producing the correct answer
with little dependency on the variety of internal contexts
in the same layer. Similarly, randomly transforming (rota-
tion, scale, and translation) feature maps of CNNs during
the training stage prevents complex dependencies of specic
rotation, scale, and translation levels of training images in
CNN models. Rather, each convolutional kernel learns to
detect a feature that is generally helpful for producing the
transform-invariant answer given the combinatorially large
variety of transform levels of its input feature maps.

invtranscnn
Figure 1: Limitation of current CNN models. The length of the horizontal bars is proportional to the probability assigned to the labels by the model, and pink indicates ground truth. Transform of input image causes the CNN to produce an incorrect prediction (a). Additionally, the representation (256×14×14 conv5feature maps of AlexNet [21]) are quite different, while the representation and prediction of our transform-invariant CNN (same architecture as [21]) is more consistent
(…)

In contrast to pooling layers, in which receptive elds are
xed and local, the random transformation is performed on
the entire feature map (non-locally) and can include any
transformation, including scaling, rotation, and translation.
This guides the CNN to learn global transformation-invariant
representations from raw input images. Notably, CNNs with
random transformation layers can be trained with standard
back-propagation, allowing for end-to-end training of the
models in which they are injected. In addition, we do not
require any extra training supervision or modication of the
optimization process or any transform of training images.

The authors believe that by pushing CNNs to model a transform-variant to transform-invariant representations they will be able to build more robust models and improve in a simplified way the performance of CNNs on vision tasks:

In conclusion, all the aforementioned related works im-
prove the transform invariance of deep learning models by
adding extra feature extraction modules, more learnable pa-
rameters or extra transformations on input images, which
makes trained CNN models problem dependent and not easy
to be generalized to other datasets. In contrast, in this pa-
per, we propose a very simple random transform operation
on feature maps during the training of CNN models. In-
trinsic transform invariance of the current CNN model is
obtained by pushing the model to learn more robust parame-
ters from raw input images only. No extra feature extraction
modules or more learnable parameters are required. There-
fore, it is very easy to apply the trained transform-invariant
CNN model to any vision task because we only need to re-
place current CNN parameters by this trained model.

invtranscnn2
Figure 2: Detailed comparison of the structure of (a) convolution layer and the proposed (b) transform-invariant convolution layer. In (b), after convolving the inputs with the kernel, the output feature maps are transformed by a random rotation angle, scale factor, and translation proportion. Then, the randomly transformed feature maps are fed into the next layer.

Conclusion

I strongly encourage readers to read the whole paper for further details, with all the detailed mathematical formulation of the model and the experimental setting used by the authors. For the moment let us here draw on the main conclusions and the possible hints as to where this research might lead in the way of better automated vision systems:

In this paper, we introduce a very simple and effective ap-
proach to improve the transform invariance of CNN models.
By randomly transforming the feature maps of CNN layer-
s during training, the dependency of the specic transform
level of the input is reduced. Our architecture is different
from that of previous approaches because we improve the
invariance of deep learning models without adding any ex-
tra feature extraction modules, any learnable parameters or
any transformations on the training dataset. Therefore, the
transform invariances of current CNN models are very easy
to be improved by just replacing their corresponding weights
with our trained model. Experiments show that our model

outperforms CNN models in both image recognition and
image retrieval tasks.

Note: the inserted figures are extracted directly from the paper

Advertisements

2 thoughts on “Robust imaging by random transform-invariant convolutional neural networks

    1. I think that is not completely correct… I think this paper should be further scrutinized as to the robustness of its conclusions, yes. I think what is claimed might not be so easily vefeified empirically as the authors claim:

      ” In this paper, we introduce a very simple and effective ap-
      proach to improve the transform invariance of CNN models.
      By randomly transforming the feature maps of CNN layer-
      s during training, the dependency of the specic transform
      level of the input is reduced. Our architecture is different
      from that of previous approaches because we improve the
      invariance of deep learning models without adding any ex-
      tra feature extraction modules, any learnable parameters or
      any transformations on the training dataset. Therefore, the
      transform invariances of current CNN models are very easy
      to be improved by just replacing their corresponding weights
      with our trained model. Experiments show that our model …”

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s