Where do features come from?

Geoffrey Hinton

doi:10.1111/cogs.12049

Where do features come from?

Cogn Sci. 2014 Aug;38(6):1078-101. doi: 10.1111/cogs.12049. Epub 2013 Jun 25.

Author

Geoffrey Hinton¹

Affiliation

¹ Department of Computer Science, University of Toronto.

PMID: 23800216
DOI: 10.1111/cogs.12049

Abstract

It is possible to learn multiple layers of non-linear features by backpropagating error derivatives through a feedforward neural network. This is a very effective learning procedure when there is a huge amount of labeled training data, but for many learning tasks very few labeled examples are available. In an effort to overcome the need for labeled data, several different generative models were developed that learned interesting features by modeling the higher order statistical structure of a set of input vectors. One of these generative models, the restricted Boltzmann machine (RBM), has no connections between its hidden units and this makes perceptual inference and learning much simpler. More significantly, after a layer of hidden features has been learned, the activities of these features can be used as training data for another RBM. By applying this idea recursively, it is possible to learn a deep hierarchy of progressively more complicated features without requiring any labeled data. This deep hierarchy can then be treated as a feedforward neural network which can be discriminatively fine-tuned using backpropagation. Using a stack of RBMs to initialize the weights of a feedforward neural network allows backpropagation to work effectively in much deeper networks and it leads to much better generalization. A stack of RBMs can also be used to initialize a deep Boltzmann machine that has many hidden layers. Combining this initialization method with a new method for fine-tuning the weights finally leads to the first efficient way of training Boltzmann machines with many hidden layers and millions of weights.

Keywords: Backpropagation; Boltzmann machines; Contrastive divergence; Deep learning; Distributed representations; Learning features; Learning graphical models; Variational learning.

MeSH terms

Artificial Intelligence
Computer Simulation
Humans
Learning*
Models, Neurological*
Neural Networks, Computer*