Sequences Of Near-optimal Feedforward Neural Networks
Lakshmi Narasimha, Pramod
MetadataShow full item record
In order to facilitate complexity optimization in feedforward networks, several integrated growing and pruning algorithms are developed. First, a growing scheme is reviewed which iteratively adds new hidden units to full-trained networks. Then, a non-heuristic one-pass pruning technique is reviewed, which utilizes orthogonal least squares. Based upon pruning, a one-pass approach is developed for producing the validation error versus network size curve. Then, a combined approach is devised in which grown networks are pruned. As a result, the hidden units are ordered according to their usefulness, and less useful units are eliminated. In several examples, it is shown that networks designed using the integrated growing and pruning method have less training and validation error. This combined method exhibits reduced sensitivity to the choice of the initial weights and produces an almost monotonic error versus network size curve. Starting from the strict interpolation equations for multivariate polynomials, an upper bound is developed for the number of patterns that can be memorized by a nonlinear feedforward network. A straightforward proof by contradiction is presented for the upper bound. It is shown that the hidden activations do not have to be analytic. Networks, trained by conjugate gradient, are used to demonstrate the tightness of the bound for random patterns. The theoretical results agree closely to the simulations on two class problems solved by support vector machines. We model large classifiers like Support Vector Machines (SVMs) by smaller networks in order to decrease the computational cost. The key idea is to generate additional training patterns using a trained SVM and use these additional patterns along with the original training patterns to train a much smaller neural network. Results shown verify the validity of the technique and the method used to generate additional patterns. We also generalize this idea and prove that any learning machine can be used to generate additional patterns and in turn train any other machine to improve its performance.