Compared to Supervised Learning, we have data, but no class labels.
Frequently used as pre-processing for later tasks. e.g. use Dimensionality Reduction before visualization or learning from data.
Preprocessing for supervised learning
A data representation that induces sparsity into node responses across features. Where a representation that uses the same number of important dimensions in data would be “complete,” this method uses more than that. Training then enforces a “sparsity” quality on internal node activations over time. The resulting model may not appear sparse, but the
Similar to an autoencoder, but with a “sparsifying logistic” as a nonlinear transfer function on the internal layer.
Difference between this and a standard neural network is that with the sparsified internal layer, you can perform a “deterministic EM” to alternately optimize weights and the internal nodes.
The resulting weights were successfully used to initialize a deep-belief network for good performance on MNIST.
See http://books.nips.cc/papers/files/nips19/NIPS2006_0804.pdf