Abstract:
Deep convolutional neural networks are considered state-of-the- art solutions due to their high classification performance in image classification tasks. The apparent drawback is the amount of computing power required to process a single input. To deal with this, this thesis proposes a conditional computation method that learns to process an input using only a subset of the network's computation units. Learning to execute only a part of a deep neural network by routing individual samples has several advantages. Firstly, it is beneficial to lower the computational burden. Furthermore, if images with similar semantic features are routed to the same path, that part of the network learns to discriminate finer differences among this subset of classes, resulting in improved classification accuracy with fewer parameters and computational resources. Investigating the network's activation on a single sample can also help interpret the neural network's prediction. Several works have recently exploited this idea using tree-shaped networks or taking a particular child of a node and skipping parts of a network. In this thesis, we follow a trellis-based approach for generating specific execution paths in a deep neural network. We have also designed a routing mechanism that uses unsupervised differentiable information gain-based cost functions to determine which subset of units in a layer block will be executed for a sample. We call our method Conditional Unsupervised Information Gain Trellis (CUTE). We tested the clustering performance of our unsupervised information gain- based objective function under different scenarios. Finally, we tested the classification performance of our trellis-shaped CUTE network on the Fashion MNIST dataset. We show that our conditional execution mechanism achieves comparable or better model performance than unconditional baselines, using only a fraction of the computational resources.