Tags adam1 derivation3 generalization2 interpretability1 lasso4 least_squares1 optimization1 physics1 regression2 regularization4 ridge3 sparsity2 statistics1 theory3 toy-models1 transformers2