Some personal notes to all AI practitioners!
In Linear Regression when using the loss function MSE it is always a bowl-shaped convex function and gradient descent can always find the global minima.
In Logistic Regression if we use the MSE then it will not be a convex function because the hypothesis function is non-linear (it uses a sigmoidal activation). Thus, it will be harder for gradient descent to find the global minima. However, if we use the cross-entropy loss it will be convex and gradient descent can easily converge to global minima!
Support Vector Machines have also convex loss function.
We should always use a convex loss function so that gradient descent can converge to the global minima (local optima free).
Neural Networks are very complex non-linear mathematical functions and the loss function most often is non-convex, thus it is usual to stuck in a local minima. However, most optimization problems in Neural Networks are due to long plateau and saddle points rather than local minima. For such problems advanced gradient descent optimization variants were invented (eg: Momentum, Adam, RMSprop).
Happy optimizations!
