Have you ever asked why we randomly initialize the weights of a neural network?

After you read this post you will know why!

When the weights of the neurons in a neural network’s layer are initialized to the same value then all neurons of the layer produce the same output in the forward propagation.

Furthermore, when doing backpropagation the gradients of the loss w.r.t to the weights of the layer are also the same values. So, when training happens with gradient descent the weights change in the same way.

Lastly, when gradient descent converges, the weight matrix of the layer contains the same values for all neurons and thus the neurons have learned the same thing.

To break this symmetry and allow the neurons of the layer (and in general in all layers) to learn new and different features we randomly initialize the weights!