Classification vs regression in overparameterized regimes: Does the loss function matter?
We characterize, through matching upper and lower bounds, the generalization error in terms of 0-1 classification loss of solutions associated with minimizing the L2 norm of feature weights in the overparameterized regime, including the (feature space) margin maximizing support vector machine (SVM). We uncover empirical and theoretical evidence for a discrepancy in the performance of classification vs regression. In particular, we show that there exists a regime of moderate overparameterization in which the mean-squared-error (in regression) would diverge to the null risk, but the classification error decays to 0 as the number of samples increases. We also discuss ramifications for the susceptibility of such solutions to adversarial perturbations.