Learning One-hidden-layer ReLU Networks via Gradient Descent
We study the problem of learning one-hidden-layer neural networks with
Rectified Linear Unit (ReLU) activation function, where the inputs are sampled
from standard Gaussian distribution and the outputs are generated from a noisy
teacher network. We analyze the performance of gradient descent for training
such kind of neural networks based on empirical risk minimization, and provide
algorithm-dependent guarantees. In particular, we prove that tensor
initialization followed by gradient descent can converge to the ground-truth
parameters at a linear rate up to some statistical error. To the best of our
knowledge, this is the first work characterizing the recovery guarantee for
practical learning of one-hidden-layer ReLU networks with multiple neurons.
Numerical experiments verify our theoretical findings.